The word induction refers to any thought pattern which moves from specific examples to more general principles. The best known example of induction is probably the scientific method – we collect data by repeating the same specific experiments multiple times, find regularities and make guesses about a potential overarching framework into which the data might fit. This is what experimental scientists do, broadly speaking. While there is an analogy between this kind of inductive reasoning and the mathematical tool known commonly as mathematical induction, these are actually quite different. To explain the concept behind the mathematician’s version of induction, it will be helpful to think of dominoes.
Imagine you have a line of dominoes, as long as you’d like it to be, and that you are standing next to the first domino. Imagine you have lined up the dominoes so that each domino is very close to the one in front of it, so that one domino falling forwards will cause the other to as well. Now, via our common sense, we can see that if we knock over the first domino, then every domino will fall. It’s a cascading effect.
This is the idea behind the proof technique called mathematical induction. We’ll take a little time to flesh it out in detail before we put it to use, but when the math starts to show up, the analogy of the line of dominoes is quite helpful for conceptual clarity. In the domino situation, the two conditions that we needed to know about could be phrased as follows:
(1) Domino 1 fell over.
(2) If Domino falls over, then so will Domino (since Domino knocks it over).
Our conclusion from these points is that every domino will fall. 1 will knock over 2, which will knock over 3, which will knock over 4, and so on. So, every domino will fall. Using the same concept in a broader setting, suppose that we have some kind of statement which depends on some number . For example, we could use as a shortcut for writing “ is even” or “ is a prime number” or “.” We can use this shorthand to ask which values of cause to be true, and which do not. In the domino example, our statement would be “Domino will fall over.” Using as a placeholder for the statements about dominoes, we can rewrite (1) and (2) in the following way.
(1) is true.
(2) If is true, then is also true.
Our conclusion is that is true for every positive whole number . This even works if we have an infinite line of dominoes, because any particular domino you might choose will eventually be knocked over.
To give another analogy, imagine a tower with a ground floor labeled floor 1. Consider what would happen to (1) and (2) if you were to define to mean that “floor is supported by the floors beneath it.” This provides yet another analogy for how the proof method I am describing works – the lower floors support the higher floors. I won’t write out this analogy in as much detail, but pausing to process this second analogy might be helpful in understanding the discussion to come.
The two steps we keep mentioning make up the technique of mathematical induction (which I will now just call induction for simplicity). There are no additional bells or whistles to it – all we need to know (once we know what is, of course) is whether (1) and (2) are true or not. Usually (1) is called the base case, since we can think of (1) as the foundation or base from which we build upwards (think of the building analogy), and (2) is called the inductive step.
To show how this kind of proof actually works, I will walk through the most common first example used in math classrooms when induction is taught – a shortcut for calculating .
Theorem: For every positive whole number , the formula
Before I move on to the proof, I want to put the question into the same language as I have been using, with and all that. We can define the claim to be the claim that . Thus, to prove this claim, we have two steps we must accomplish. The base case is to show that is true, and the inductive step is to show that we can use the truth of to prove the truth of . Now, I won’t be writing in the actual proof, I will leave that as a mental exercise for you all. This paragraph should provide enough clarity to aid in understanding how the proof I’m about to write relates to the analogies of the dominoes and the building.
Proof: For the value , we have , and the sum of all numbers up to 1 is just 1. Therefore, the base case is true.
Suppose now that the theorem is true for a particular value of , so that the equation
is known to be true. We would like to show that the theorem is true for the value , that is, that the equation
is true (think about what is to understand the above formula). To see that this is true, we can see using our previous assumption of the truth of the equation for that
The right-most part of the equation we have just derived can be rewritten using a ‘common denominator’ (since 2/2 = 1):
We can also observe by using the “foiling method” that , and therefore we can see that
Following all of the equalities shows that the equation is true.
We have proven both steps, and therefore the original claim is itself always true. So, we are done with the proof.
The reason induction is worth learning is precisely for problems like this – where there is a kind of one-after-the-other connection between certain parts of the problem. When things are built out of smaller parts that look a lot like the bigger part, the technique of induction is often quite useful. It does take some time to build up intuition for this one though. And you do have the be extremely careful how you use it (look up the “all horses are the same color paradox” to see a misuse of induction if you are curious).
In light of the current COVID-19 pandemic, I have written an article entitled How Does Disease Spread? in which I discuss how mathematicians attempt to understand the way that a virus is spreading in the population. In said article, my primary focus is building up the concepts and thought processes that are needed to understand how a mathematician would come up with equations to simulate a virus spreading. I now want to write a brief follow-up addressing a related question:
In light of the behavior of viruses, how can we determine whether things are improving or getting worse?
This question actually isn’t quite as easy as it seems. The initial, intuitive response would be to look at the current number of contagious people, and to call any increase in that number bad and any decrease good. Of course, it is a bad thing when more people are becoming ill and it is a great thing when the total number of infectious people is going down, but that is actually not a very good indicator. The reason, as I will try to explain in this article is that in reality, the improvement actually starts to happen before the number of active cases starts going down.
A Brief Review of How Viruses Spread
Because viruses spread through direct or indirect personal contact, it is absolutely fundamental that at the beginning of the spread of a virus, the virus grows at an exponential rate – which basically just means that the speed at which is spreads is directly related to the number of people that have the virus. This is why numbers have been changing so drastically. When more people are infected, there are more ‘opportunities’ for others to also get infected, and so the number of new infections keeps going up.
Mathematicians recognize exponential equations as one of the most important of all equations, not only because they appear in numerous real-life situations like virus spread and population growth, but also because they are interesting in their own right. Exponential equations are so important that there is a notation developed specifically to help us write down these – normally we just use the term exponents. As an example of this form of writing, the number just means , the number means , and more broadly, the number means 2 multiplied by itself times. There is nothing special about the 2 either – if is any number, writing just means multiplying by itself a total of times.
Exponential Growth is Highly Sensitive
The number of cases of a virus always begin with exponential growth. But eventually, there are factors that slow this down. This is the point of social distancing and all travel shutting down – these measures slow things down. And as it turns out, even tiny changes produce massive shifts in the outcome of exponential equations. To see why, we can use made-up viruses with made-up equations. Let’s say some evil genius created two new viruses – let’s call them Virus A and Virus B. Since he is a genius, after all, he is able to figure out the equations that tell him how the viruses will spread – which are exponential. He finds that if we begin at “Day Zero” with only one person affected by Virus A, then after days, there will be about people infected by Virus A. He does the same thing for Virus B, and the equation turns out as .
At first, his reaction might be “Well, A and B are almost identical! So I can use either one to wreck havoc.” But if he look a little closer, these actually turn out to be massively different, even those 1.1 and 1.15 are nearly the same number. To see this, we will compare the value after one month – 30 days – of infection. The two equations give values of and . If you went for 100 days, then Virus A has infected a little over 10,000 people, but Virus B has already infected more than 1,000,000 people!
This is one of the fundamentally important aspects of exponential growth that mathematicians have understood for centuries – changing the number ‘on the bottom’ by even the tiniest amount has immense consequences. As a side note, this is why everything each of us do during this crisis matters so much, and this is why we are taking such drastic measures. Because of what a virus is and how it spreads, mathematics tells us that even small improvements in sanitation can, over the course of a month or two – make a difference in the thousands or even the millions.
How Things Get Better
This is not exactly obvious from our discussion so far, but not all exponential equations become large rapidly – some of them actually become small very rapidly. The ideas we need are summarized in 3 bullet points:
Multiplying a number by 1 doesn’t change anything.
Multiplying a number by something between 0 and 1 makes it smaller.
Multiplying a number by something larger than 1 makes it bigger.
Since exponential equations are all about multiplying numbers together, this matters quite a lot. The value of will eventually become astronomically large if you give it enough time, and the value of will, if you give it enough time, eventually become indistinguishable from zero. The moral of all of this is that for exponential equations, it is not increasing or decreasing that matters most, it is about whether the number we are multiplying by is bigger than 1 or less than 1. To interpret this in terms of the spread of the COVID-19 virus, what matters is not the current number of cases, but the number of new cases every day. If there are fewer new cases today than yesterday, that is what is really important. And more than that – when you compare two days, you don’t use subtraction, because exponential equations have nothing to do with addition or subtraction. You use division to compare days, not subtraction. What we really care about is the value
and our goal is to make this number smaller than 1. Of course, on a day-to-day basis, this number – called the growth factor – may change quite a bit. But if we have several days in a row of a growth factor less than 1, that is a strong sign that the situation has entered the stage of getting better rather than worse. And once the growth factor becomes less than 1, it is only a matter of time until the peak number of infections will be in the past and we will be on our way to a recovery from the pandemic.
Conclusion and Clarification
Obviously, this is a simplification, I am not an epidemiologist and I have never professionally studied the models that professionals are using to model this virus. However, I can say as a mathematician that the growth rate absolutely does matter, and if you see things about the growth rate in the news or on websites that are tracking the statistics of the virus, there is a good reason those numbers are there. They help us a lot in understanding whether things are getting better or worse.
As I become more aware of the mathematics myself, I will continue to write more about what is going on and how professionals are understanding all the numbers. For now, I echo the advice that governments around the world have put forward – stay inside as much as possible, stay sanitary, and seek medical advice if you show any symptoms. Let’s beat this together.
These are difficult times. Because of the spread of the COVID-19 virus, we are all taking very drastic measures to ‘flatten the curve’ by socially distancing, increasing our awareness of hygiene, and many other measures. These, of course, should be done, because lives will be saved. But how, exactly, do we know this? We don’t quarantine for every disease, even something as horrible as Ebola never led to any measures as drastic as those we now face. The question I want to address here is why this is so – how do we know, mathematically, when things are getting out of control? And how do mathematicians understand and calculate the way diseases spread? I will be trying to explain some of the basics of how this virus spreads.
I will also add some additional resources that I really like that discuss mathematics in ways that I find very helpful. In particular, I highly recommend the videos on the YouTube channel 3Blue1Brown. This channel does an amazing job with visual displays, far better than I could every do, and very effectively explains the concepts behind the models.
Throughout this article, the concept of a mathematical model is absolutely central. A mathematical model is created by studying real-world patterns of something (disease spread, economic growth, population growth, gravity, anything at all really) and then attempting to capture the patterns we see using equations that mathematicians know have some of the same patterns.
The reason we make models is because it enables us to make some predictions about the future. In many cases, we can actually learn more about the real-world events we are modelling than would have been possible by mere observation. Many of the sciences, especially physics, chemistry and biology, very frequently use of mathematical models to aid in understanding their fields.
Models can be extremely different from one another, as the model depends vitally on what is being modeled. Since this is so, we really will only focus on the ideas that play an important role in modeling the ways that diseases spread.
The first thing to talk about is the exponential growth of diseases. Broadly speaking, when we say that a number is growing exponentially, we mean that the speed at which this number is growing is directly related to the number itself. The most common example of this kind of growth that we experience lie in population growth. The number of people in the next generation depends directly on how many people there are now, and you would find the number of people in the next generation by multiplying the current number of people by the average number of children per person.
Let’s construct an example using some numbers that will be easier to calculate with. Suppose that an isolated island has a population of just two rabbits, and that every year, every pair of rabbits have twins. After one year, there would be four rabbits. After two years, each of the two pairs has twins, and so there are now eight rabbits. When you continue the pattern, every year the number of rabbits doubles – the pattern becomes
Something important to notice here – both of these lists go up to 12 years. Notice how massive a difference we have from doubling to tripling, we have gone from about 4000 rabbits to more than 300,000! This discrepancy keeps getting worse and worse as time goes on. To see why, we can describe both of these growth models with equations. Since in these lists, we obtain each entry by multiplying the previous entry by a growth factor (2 and 3, respectively, for doubling and tripling), our lists actually look like this:
In equation form, after years, the first list has entry , and the second list has entry . If we want to know the discrepancy from year to year as a percentage, we can calculate the ratio of the two:
If we plug in , for example this ratio is 1.5, which means that there are 50% more rabbits in the second list than in the first list. In the calculation I made earlier at year 12, the ratio is . This ratio will get bigger and bigger as the number of years grows.
Why Exponential Growth Matters for COVID-19
When we model a virus, exponential growth plays a front and center role. The reason for this is the way viruses spread – namely, from person to person. To oversimplify, if a disease spread in such a way that every sick person infects one other person a week, and that right now there are two people sick with this disease, the list of the number of sick people after weeks is the same list for the doubling of population of rabbits:
This is because after one week, the initial two sick people each infected one person, and that means two new infected people, for a total of 4. The next week, each of the 4 infects one new person, so we have 4 new infected people for a total of 8 infected people, and so on.
This is the nature of how viruses spread. We can use this idea to develop our very first mathematical model of a virus (which will need to be improved later). If a sick person infects people on average every week, and if there is currently just one sick person, then the equation tells us the approximate number of people that will be sick weeks from today. If there are currently people sick, we can modify this equation to .
Making the Model Better
It is important to point out that this is an oversimplification. If you look at real-life instances of disease spread, you will find that the equation does a pretty good job of predicting things at the beginning, but becomes wildly inaccurate later on. This is because, eventually, the disease begins to decrease, but an exponential model never does. This is the right place to begin, but we need to make some changes. As a mathematician would, we must ask ourselves why it is that diseases eventually slow down, and once we know why, we can modify our equations.
There are two big errors we have made that need to be accounted for in order to make our model more accurate. The first thing we didn’t think about is the fact that people with a disease will either recover or die, and in both cases they are no longer contagious. This will decrease the overall number of contagious people, and so will slow down somewhat the exponential increase. However, it turns out that even when you account for this, you still have a kind of exponential increase. Suppose for example that every sick person is sick for 2 weeks and infects one person per week. Then if we start with one sick person, after 1 week there will be two. During the second week, new two people are infected, but one person recovers, and so we now have three infected people. On the third week, then, three new people are infected, and the one person who became sick during the first week recovers, and so we are now at 5 infected people. After the fourth week, there will be 5 new infections and 2 people recovered, and so there will be 8 infected people. In general, the value for the number of infections after weeks is obtained by adding together the values from the two previous weeks. The result of this is the famous Fibonacci sequence:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, …
This is an improvement, but this is still too fast – mathematicians know that the Fibonacci sequence grows very much like the exponential function . So, there must be something else we are missing.
The second detail we are missing is that there are only so many people out there, and that once people are no longer contagious, they don’t get sick again. This detail is really what reigns in the growth of a disease. It will take a little bit of time to factor in these into our equations for the virus.
The SIR Model of Disease Spread
Using the modifications we have just made, we arrive at the SIR-model for disease spread. SIR is an acronym for the 3 ‘types of people’ that the model takes into account. S stands for the susceptible people – those who have not yet become sick. The I stands for the infected people, those who are currently contagious (even if they are not showing symptoms). The R stands for the recovered people, by which we mean people who have had the disease and are no longer contagious. Notice that this would include people who have died, since they are no longer causing others to be infected.
The term ‘recovered’ is then a little misleading. You can make more complicated models that take into account the real differences between those who die and those who make a full recovery – namely that those who die from a disease likely are in a hospital, and for this and other reasons likely cause fewer people to get infected than someone with only a mild case. For the sake of simplicity, we will ignore this difference. A mathematician working on a problem like this would take into account way, way more factors than we have included here, and would probably create more categories as well (like people who are infected by haven’t shown symptoms yet), but as it turns out the SIR-model will give us pictures basically identical to the curves we are supposed to be flattening, and so we won’t make things any more complicated than we need to.
The Mathematical SIR-Model
In order to write down this model, we need to define some notation. First, we will have our three variables, , and , as defined earlier, to count the susceptible, infected, and recovered people. We will also need two constants and . The constant will be a measure of how quickly the disease is transmitting from person to person, and will measure how quickly people recover from the disease once they get it.
We are modeling how quickly the disease is spreading, so we need a way to write down the ‘speed’ of a variable instead of just its value. The way we usually write this in mathematics is (read “S prime”). The variable will measure how quickly is changing, measures how quickly is changing, and measures how quickly is changing.
Now, we can build our model with these variables. In order to make sense of the model, we first will describe the conceptual process that we are modeling. In the SIR-model, there are only two types of changes we must considered – we care about how many susceptible people become ill, and how many ill people recover. You can picture this process with the diagram below:
If we can understand how both of the arrows work, then we can build all of our equations out of this diagram. The second of the two arrows is the easiest to grapple with. If we have a population of people infected by a disease that normally lasts for two weeks, then on an average day 1 out of 14 of them will recover. This recovery rate is exactly what we meant by the constant earlier, and this constant will depend on how long the disease lasts, as well as on how successful the medical system is at speeding up recovery. So, on a given day, a certain percentage of all infected people move from the category into the category. This percentage is measured by , and the number of people will be . So, the has associated with it the value . Since the process of recovery will cause to go down, we will have to subtract $bI$ from $I^\prime$ (going down = negative change = negative value). For the type of reason, we will have to add to the value.
The first arrow is a bit more complicated. There are three primary factors to consider in how many people move from being susceptible to being infected. Firstly, we require the ‘transmission constant’ from earlier, which indicates how contagious the disease is from person to person, as well as how much person to person interaction there is in public. We can also observe intuitively that larger populations mean more people will get sick and that more infected people will increase the ‘number of opportunities’ people have to become sick. When you factor all of this in, the process of will have a value of associated with it. For the same reasons as in the previous paragraph about increases and decreases, we will subtract this value from our equation and add it to our equation.
We have now arrived at the SIR-model, whose equations are now listed below:
To give a verbal interpretation of all of this, the equation means that on a daily basis, the number of people who are susceptible has decreaesd by , the number of sick people who are no longer sick is , and the number of people who are currently sick has changed by the value , which could possibly be positive (more sick people) or negative (fewer sick people).
What Does SIR Predict?
Now, the curves given to us by the SIR-model are in fact the curves that we have been seeing everywhere in the news. Here is an image of how all three variables change in a certain situation.
The curve we have been seeing the most of is the green one. This is a familiar shape to us now, this is the curve we are supposed to flatten.
What is ‘Flattening the Curve’ Mathematically?
The constant press to ‘flatten the curve’ is everywhere now. But what does this actually mean in terms of the model, and how do we accomplish it?
One essential property of a model is that the shapes of the graphs are totally determined by the initial conditions (the ‘start values’ for all of the variables) and the value of the parameters. In order to change the shapes of the curve, then, we must either change the initial values of or we must change the constants . In the case of a disease, the initial values of the variables only have one option – one person infected, none recovered, and everyone else susceptible. That’s it. So, if we want to flatten to curve, we must do something to alter the values of . So, what factors contribute to these constants?
We will talk about first. Remember, measures how quickly contagious people recover, which means that the value of is determined by two things primarily – the underlying biology of the disease, and the medical technology available for treating it. For bacterial infections, developing and administering antibiotics can drastically reduce the value of . For a viral infection like COVID-19, however, the underlying biology does not allow this. While there are things that the medical system can do to decrease for a virus, it is much more difficult.
And in any case, changing the value of doesn’t actually flatten the curve in the same way that we are seeing in the media. When you change , it does decrease the peak, but there isn’t not any noticeable ‘spreading out’ effect. Of course, lowering the peak means fewer people will be infected, which is wonderful, but the goal of our quarantining is to not only lower the peak, but also delay the peak so that our medical systems have as long as possible to prepare. Changing does not delay anything, but changing does.
The constant has many, many more factors which affect it, and so it is much easier to change. This constant tells us the speed at which the disease is spreading from person to person. Part of this is of course determined by biology – a disease that can spread through sneezes and coughs will have a larger value of than one that cannot. But the ways in which we interact with one another can also decrease or increase . The more we physically interact with one another, the larger the value of . The more socially distance and wash our hands, the smaller the value of . When you study the graphs on a computer, you can see that when you change , this not only lowers the peak but also delays it. This is what we want to do.
Unfortunately, I don’t really have the time or knowledge to embed into this article an actual visual demonstration of how the curve flattens. However, I will provide links to websites that have this feature, as well as some very well-made YouTube videos about the same ideas. I’ll give links below. Feel free to leave me questions in the comments or email me at my website’s email address.
This site uses the variable (aka ‘beta’) in place of , and (aka ‘gamma’) in place of . You can set the initial values of and here, as well as the values of the parameters, and the overall time frame. My recommendation for some good visuals:
Mess around with values of Beta between 0.2 and 2 to watch the curve ‘flatten’! Changing the value of Gamma will give you an idea of the difference between an effective and ineffective hospital system.
Previously, I have talked about logic and some of the most important rules of logic. These are quite important and useful in doing mathematics. However, it is necessary to go further, because logic is not specific enough. Mathematics analyzes patterns that involve concepts like shape, number, repetition, and symmetry. Pure logic does not adequately handle all of these ideas, and so we must expand our thinking to help us understand mathematical ideas.
Because of this expansion, and the complexity that emerges as mathematics develops, there are a variety of styles of proofs in mathematics that are useful. There are in fact some pretty large practical and strategic differences in various approaches that a mathematician might take to solve a given problem, and so before I elaborate specific methods, I’d like to discuss some of the broader categories of ideas and proof styles that mathematics uses.
Direct and Indirect Proofs
The first distinction we can draw is between direct proofs and indirect proofs. As the names might suggest, the direct proof is the natural beginning point. In essence, a direct proof is one that accomplishes its goal in exactly the same format that the goal itself is presented. An example may make what is meant here clearer. Take some time to consider the following mathematical proof of the statement that an odd number times an odd number is always odd.
Proof: Recall that we can say a whole number X is odd if there is some other whole number Y for which X = 2Y+1. Suppose that N and M are both odd numbers. By definition of ‘odd number’, that means that for some whole numbers A and B, the relationships N = 2A+1 and M=2B+1 are both true. We can now multiply N and M using the distributive property:
Next, we make the observation that by taking a common factor of 2 in the first three terms above, we obtain N*M = 2(2AB + A + B) + 1. If we go back to our definition of an odd number, notice that the equation we just created satisfies this definition if we let N*M play the part of X, and 2AB + A + B play the part of Y. Since N*M satisfies the definition of oddness, we now know that N*M is odd.
Let’s pause for a moment and consider what we have done. The claim I made was that if you take two odd numbers and multiply them together, the result will also be odd. The discussion that ensued moved in the same direction as the initial claim I made. I started at the beginning of the sentence (with two odd numbers), and I moved towards the end of the sentence (their product is odd). This is the method of a direct proof – it flows in the same direction as the original statement.
Not all proofs are direct. Sometimes, there are roundabout ways to learn things. This occurs in common experience through the process of elimination. To summarize the process of elimination, suppose that you have some problem you have to solve that has 3 possible options. If you manage to demonstrate that two of those options fail, then the remaining option must work, even if you don’t know why. This is very similar to the idea of an indirect proof. To draw contrast between the direct and indirect proof methods, let’s prove the same statement from earlier, that an odd number times an odd number is always odd, using an indirect proof.
Proof: Suppose that we have two whole numbers N and M, and we are told in advance that N*M is not odd. Since, by definition, every whole number is either odd or even, N*M must be even. This means that there must be some whole number X for which the equation N*M = 2*X is true. Now, 2 is known as a prime number, which is a number whose only factors are 1 and itself (more on prime numbers another time). Because 2 is a prime number, we know that 2 has the special property that in any situation that 2 is a divisor of a number N*M, 2 is a divisor of either N or M (or perhaps both). Because of the definition of odd numbers, 2 cannot be a divisor of any odd number, and so the equation N*M = 2*X implies that one of N and M must not be an odd number. We may conclude that our original claim that an odd times an odd must be odd is true.
A trained mathematician would recognize the following paragraph as a proof by contrapositive, which we can discuss in more detail at a later time. The logic being used here is that the statement “If P, then Q” has the same meaning as the sentence “If opposite-of-Q, then opposite-of-P“. You can convince yourself of this if you’d like, or do some research of your own right now. I’ll provide an explanation of my own later. My main point for the time being is that this logic is correct, but on the surface is not the same sort of thing as the original statement.
To see this, we again take a bird’s eye view. The original statement, an odd number times an odd number is always odd, begins by assuming some information about the two separate numbers, which we gave the names N and M. Yet in the proof just given, we make an assumption about the single number N*M, not the separate numbers N and M. And the conclusion of the original statement is about the number N*M, but the conclusion of the proof I just gave says something about the two numbers N and M. So there is something “flipped” about this. Despite these differences, the proof still works and in the end establishes the same truth as the original proof does. The end-to-beginning feeling of this argument is why we use the title of indirect proof. Instead of attacking the question head-on, it addresses a related question that indirectly answers the question we originally asked.
This now completes an initial discussion of the difference between a direct proof and an indirect proof. We can now move on to another important distinction.
Existence, Uniqueness, and UniversalProofs
This distinction is not so much in the type of proof, but in the type of statement. However, I include this distinction in this post because statements of the three different types have proofs that are usually quite different from one another. So, I find this worth discussing now.
The differences between these three different types of statement, and types of proof, are about quantity. An existence proof is one that proves that it is possible for some particular statement to be true. For example, when we say that the equation 2x = 8 has a solution, and we justify this by noticing that x = 4 is a solution since 2*4 = 8, this is an example of an existence proof.
A uniqueness proof proves that a particular problem cannot have multiple different solutions. To use the same example, we can say that the only possible solution to the equation 2x = 8 is x = 4, which we can prove by ‘dividing both sides of the equation by 2’. Notice that we haven’t actually claimed here that x = 4 is a solution per se, all we have stated is that x = 4 is the only possibility. Checking that x = 4 is a solution is the existence proof from earlier.
Finally, there are what might be called universal proofs. An example of this would be the claim that the equation a*b = b*a is always true. As opposed to the two examples before, we are making the claim that it does not matter which numbers a and b might be, because this equation will be true in every case. Unlike in the previous proofs, it would not be enough to check examples, we would instead have to give a broader argument.
When reading and doing math, it is important to remember which of these ideas we have in mind, because it drastically impacts which we can and cannot do in our proofs.
Constructive and Non-Constructive Proofs
Unlike the previous discussion, this distinction does not apply to all proofs. This distinction only relates to existence proofs. An existence proof can be either constructive or non-constructive. A constructive proof is one that gives a specific example that is a solution to the problem at hand. For instance, the proof that 2x = 8 has the solution x = 4 is a constructive proof, because we now know exactly what that solution is. On the other hand, a non-constructive proof is one that proves that there has to be an example, but does not actually give the example. Non-constructive proofs are very often indirect proofs. I will provide a more detailed example of a non-constructive proof later, but I would encourage the curious reader to consider the following:
Suppose that I have three boxes and four apples, and that I place all my apples into the boxes. Then one of the three boxes must have more than one apple. If you take your time and think it through, I think we can convince ourselves that this is definitely correct. But if I then asked you which box has more than one apple in it, we are speechless, because we do not know without looking into the boxes. Notice that in this example, we know that there is a box with more than one apple, even though we have no idea which box that might be. This is the essence of a non-constructive proof.
While abbreviated and perhaps not quite comprehensive, this should provide a strong enough conceptual overview of some of the different strategies one can use when doing mathematics. In the posts that follow, I hope to use this conceptual framework to lay out some specific ways that mathematicians use these concepts to learn about numbers and the world.
One of my closest friends, on hearing I was going to start doing some writing, brought me the following questions she was curious to hear me address:
Have you felt that math and the thought processes that you use to solve math problems have aided you in better understanding your mental health? Or made it more difficult? Do you see “mathematical thinking” as a therapy for you? If so, how do you think others who struggle with mental health could use a similar thought process to help?
I have wanted to write about this for quite some time, but didn’t quite know where to start. I am pursuing mathematics as my career, and “mathematical thinking” is both familiar and important to me. I have also suffered from post-traumatic stress, depression, anxiety/panic attacks, and ADHD in the past, and some of these still affect me. Plus, very nearly everyone I’ve ever cared deeply about has one or multiple similar psychological conditions. So issues of mental health are also quite familiar to me on a personal level (though I am not an expert, so please do keep that in mind).
But I’d never really thought out in detail the connections in my own life between these two areas of my thought life. My immediate response to the question of whether “mathematical thinking” affects how I think about mental health was a resounding yes, but I struggled to put exactly how into words. Hopefully I can now provide a partial answer to these questions.
I think the most natural place for me to begin answering is to point out why I think these questions stuck out to me so vividly. These questions imply, or at least suggest something simple yet deep. You see, the questions implicitly expect that there is a connectionbetween these two, between mathematics and my emotional health. This is so often ignored, downplayed, or outright denied. For instance, how often are mathematicians and scientists stereotyped as stoic, unemotional people? More than you think, probably. How sad and strange it is that by expressing passion (an emotion) for a subject like math, we are labelled unconsciously as unemotional people. There is in our culture a bifurcation between the objective and subjective. To be fair, the difference is a real one, but it is often misused. My friend’s questions are about a specific kind of interaction between the objective and subjective. I have put some time into thinking about objective/subjective interactions before in different contexts, and I imagine many of you have as well, so I hope this is a helpful framework through which to discuss the topics of mental health and math side-by-side.
One things I view as a “key skill” in mathematical thinking is being able to isolate and compare various broad and important concepts. I try my best to carry this skill over when I think about other things, and find it helpful with thinking about my mental health and mental health issues more broadly. What I arrive at when I have applied this to my thoughts is that objective and subjective perspectives are both crucial to my life, and that neither can totally explain the other. Mathematicians are just as emotional as everyone else and rely on subjective perspectives just like everyone else, and artists think objectively about the world around them too.
There are two opposite errors that seem to pop up fairly often here. One is the emphasis on the objective over and against the subjective; and the second is the opposite. Both of these are worth discussing in relation to mental health, and so I’ll handle them one at a time.
Firstly, I think there is pretty strong evidence that a bias towards the “objective” in our culture. We tend to associate “objectivity” with fields like physics, math, and perhaps even chemistry and biology. We tend to associate “subjectivity” with humanities, perhaps also religious/political beliefs. I find it quite odd that our culture seems to value what physicists and biologists say about what we view as “subjective” fields and that we don’t care at all what artists, musicians, or religious figures say about physics or chemistry. I find it odd because both of these situations represent a person who is an expert in one topic speaking about an unrelated topic, and we trust the one and reject the other. The only way I’ve been able to make sense of this situation is a bias in favor of what a person views as “objective.” I also find it odd because the “subjective” fields from before are actually largely objective. If you told an artist that Van Gogh was a bad painter, there is a good chance she would tell you that you are wrong, not merely that she disagrees with you. This is an objective claim. Any discipline in the humanities you can think of will have certain objective ideas at its core.
From my perspective, it seems like this stems from a confusion of objectivity with a sort of empiricism – roughly the idea that only something that can be demonstrated by some kind of experiment counts as true. This kind of thinking tends to place math, physics, and chemistry on a pedestal, and places everything else into some kind of ranking. I think this idea has a lot to do with stereotypes about mathematicians and scientists, an overvaluing of math/science and an under-appreciation of humanities, arts, and culture. Personally, I also find the trend of downplaying subjectivity very dehumanizing. For if things like emotions are merely chemical reactions, what makes us any different from a camel, an insect, or even a rock? Why is one kind of chemical reaction more valuable than another? When the reality of subjectivity is denied or downplayed, we end up with these kinds of really tricky questions.
So, when I contemplate my mental health from this angle, I constantly remind myself that it is actually real, which is actually sometimes a kind of comfort. It is actually objectively true that I experience emotions, and my emotions matter a great deal. This is a part of my experience as a person that is worth talking about. To downplay the role of subjective experience, for the person dealing with mental health issues, is to say that their emotion does not matter. So, reminding myself of the importance of my mental health in that way helps me.
One the flip side, we cannot overvalue subjectivity either. We are feeling beings, yes, but we are also thinking beings. We have the incredible ability to reason. It is worth pausing for a bit on that. We are capable (at least a lot of the time) of recognizing objective truths about the world around us and articulating those truths. We can also combine different things we already know to learn something new. To deny all of this would, as before, be to deny a crucially important part of what it means to be human. And not only would denying the importance of objectivity deny us a large part of who we are, it would pretty much mortally wound whatever is left of us. If all that matters is how you feel and how you see things, what about people with depression? They see themselves as worthless, some even think the world would be better if they were dead. When we rightly encourage our depressed friend, telling them that they are not worthless and that we care about them. This only makes sense if their opinion about themselves is superseded by something else, objective and outside of them. And even more generally, if our value as human beings is based on anything subjective, then we will be crushed under the burden of having to prove ourselves every minute of every day.
This leads into another main point where my “mathematical thinking” helps me with my battles with mental health. Even in the face of very real pain, I know objectively that I am more than this. I can explain to myself using reason how I know this. I can know that I am loved, even when all my feelings say I am not. I can know I am safe, even when all my memories tell me I will never be safe. This kind of objectivity helps me stabilize myself emotionally. As I remind myself constantly of these things that are true about me, the battle becomes easier to fight.
These combined ideas have been very important in my thinking and reflecting on life and faith. We cannot get anywhere on any important discussion unless we recognize the full humanity of ourselves and others, which includes our intellect and ability to think reasonably as well as our deep emotional and subjective experiences of the world. Any worldview that denies, vilifies, or downplays either of those, to be blunt, is dehumanizing. And one other point I ought to make sure is clear – there is a sort of “middle ground” between the subjective and objective. To see an example of what I’m trying to say here, consider a man who goes 48 hours without food, and tells you “I am hungry.” On one hand, this is a report of a subjective experience of hunger. On the other hand, this is an objectively true report on his part. And there are certain things that are true in general about what it is like to be hungry, or how hungry people tend to behave, or which kinds of actions relieve this feeling of hunger. Talking about things that we experience subjectively like emotion, morality, and beauty doesn’t preclude the possibility that there is also an objective element to all of these. Perhaps this point will be a discussion for another time. But for now, more on mental health.
Now that I’ve laid out some initial thoughts, I want to zoom in specifically on mental health and thinking mathematically. The thought processes of mathematics/science have led me to understand more deeply that scientific thinking does not and cannot solve everything. Science is not omnipotent, nor was it ever meant to be. Part of my approach to my own mental health is scientific – I have had ADHD since childhood and I take medication which helps me in a lot of day-to-day activities. But my medication is not a fix-all either. ADHD is very misunderstood, there are lots of symptoms and even more ‘side-effects’ of ADHD that I live with daily, some of which are good and some harmful. What remains after medication must be dealt with in other ways.
It is important to remember that most mental health disorders are at least partially caused by brain malfunction. Yes, there are factors other than brain chemistry, and there are times when those factors are actually primary and the chemistry only secondary. I have mental health issues that are only tangentially affected by biology and are largely caused by other things. But it is undeniably a factor for many, and medical malfunctions call for medical solutions. It’s just common sense. Things like prayer and emotional support absolutely also have a role in these situations. Even with a broken leg or cancer, there is an obvious role for emotional and spiritual comfort. But to deny the objective nature of a chemical malfunction in the brain is a tragic error.
On the other hand, the subjective element of mental health is undeniable. It isn’t always chemical. Much of my struggle with mental health is primarily caused by things like traumatic experiences mixed in with poor self-perception or other emotional struggles. These cannot be fixed with medication or science. The only way I know to heal from something like this is to talk to another human, to be understood, to be known and loved and cared for.
I have dealt with major depressive disorder (aka depression) and post-traumatic stress disorder (aka PTSD) or borderline versions of these disorders, on and off for the last 4 or 5 years. Just so that nobody worries, I’m doing quite well now compared to when all of this began. Praise God, the worst is behind me. I remember all too vividly, a little more than a year ago, shaking and crying uncontrollably, unable to move my own body for an hour, because of nothing more than a memory. Even today, sometimes a simple flash of a memory hits me physically like a violent push. I remember the temptation towards self-harm. I remember knowing that the pain of self-harm would have been far more bearable than what I was feeling and thinking about myself every day. I remember when the words “I care about you” sounded to me like someone trying to convince me 2+2=5. I remember assuming that everyone hated me subconsciously as much as I hated myself, that I would never be worthy of being anything more than a metaphorical punching bag to help other people recover from their own pain.
For anyone reading this who knows these or any similar feelings, I am here for you and I care, no matter what your thoughts tell you. Even if I don’t know you, or if you think I don’t like you, or whatever else you think about me, I honestly do care. Email me, text me, reach out to me in any way, and I will gladly speak with you and help you in any way I can. Reach out to me or someone else you trust if you need help, because you matter. You have infinite value and worth, as does every human being on this earth. You are so much more than those feelings you have, just like I am more than the pain I’ve just described.
To repeat a previous point, everything I described is not caused by a chemical imbalance, or at least not primarily. When I was just beginning to discover my own emotions, I grew close to someone whose emotional struggles had an unfortunate effect on my own. I consider this person a friend still, and I don’t want to go into details about what exactly happened – but what I’ve already said should tell you enough about how it affected me. I grew to believe that I was unworthy of normal friendship because I didn’t know how to handle the confusing and painful thoughts going through my head.
Biology didn’t cause this, so biology shouldn’t be the primary solution either. The answer to my problem was loving friendships, and above all experiencing the love of Jesus Christ. I have friends who, when they learned what I was going through, resolved to dig me out of this hole I’d fallen unwittingly into. I was incapable of telling myself that I was worth caring about, so they became that voice they knew I needed to hear. I was too weak to fight all the memories and negative voices myself, and they knew that and fought for me and spent months rebuilding me psychologically so I could fight the battle myself. And, most importantly, they as Christians saw the spiritual battle going on in my soul, and they knew that only the love displayed by the cross could give me the strength to carry my own painful cross. And that’s exactly what happened. No greater love has anyone than this, that he lay down his life for his friends. I had friends who carried my burdens alongside me, and who pointed me to the one who truly laid his life down for me.
I could never have recovered were it not for this. I don’t want to go so far as to suggest that conversion to a religion is the only answer to mental health issues, but I will say that it is the only ultimate, lasting source of healing and hope that I know. I remember so vividly realizing that God cared so much about my pain that he came down and suffered alongside me, just like my friends had been doing for months. This didn’t magically delete all the pain, but it has infused me with the strength and hope to pull my battered soul off the ground and fight back.
This leads well into a very practical way that my mathematical thinking has aided with my mental health. When I first became a follower of Jesus, I grew strong enough to fight back against the forces that were dragging my soul into despair. That meant I had to learn how to fight. Once we recognize that this is a fight, we canbestrategic. When you are in a competition of any kind, including a war, you analyze your opponent and pinpoint their weaknesses, as well as ensuring to utilize your own strengths. Realizing this, I took my years of training in thinking analytically and took to analyzing all the darkness that plagued me. I did this through emotional and spiritual conversations with trusted friends, through reading, and through intentional introspection. These dark emotions and thoughts seemed to have some kind of goal in mind, and I tried to discover what those goals were.
One thing I realized that the dark can be just as emotionally powerful as the light, so emotion alone would just lead to a standstill in the battle. Negative and positive emotions can be equally intense and swaying. So, as important as the subjective, emotional elements were in the process of healing, I needed another weapon. And praise God, we have another weapon. The darkness has no claim on truth. Darkness can only attack with half-truths and deception, but with the light is everything true and good. This I know to be true as a believer in Jesus. Therefore, I attack darkness with every weapon available to me, both with emotion and with truth.
And I don’t mean “my truth” – that’s just emotion pretending to be truth – I mean the truth. This can come from both religious and secular sources – part of the truth I use to fight came from seeing therapists and being in group therapy. Part came from church and from studying the Bible. Part came from recognizing the role of suffering in a broken world. Part came from learning philosophy and theology, growing in my understanding of God’s amazing power, majesty and love. And since you can’t fight an emotional battle with facts alone, I leaned on and continue to lean on those around me that I trust the most. They know me, they know the details of my flashbacks, to the point I only need to say one or two words and they know exactly what’s in my mind. They know that I sometimes don’t have the strength to remind myself of what is true, and I know they have given me permission to pour out the various emotions that come up when I’m in a bad place. These kinds of friendships point me back to God and allow me to process everything going on in my head.
I hope a reader will think of all of this as initial thoughts, and not ascribe to them more weight than they deserve. I do believe what I have written here, but as with any conversation involving human psychology and emotion, things are very complicated. I hope to develop some thoughts further in the future, and I also hope to share more of my personal struggles with various mental health conditions.
As a closing note, I am interested in knowing what questions others have, especially on matters of mathematics, the Christian faith or my personal experience with it, or other matters about which I am interested. Feel free to reach out with some questions, I’d be happy to discuss whatever I can!
There are a lot of different ways to approach understanding what mathematics truly is in modern times. And these different approaches are fundamentally different. Does one come from the angle of discussing the beauty of patterns? Or perhaps of the power of the human mind to move from specific examples to general conclusions (similar to science)? Or maybe the historical development of mathematics and how different cultures used it? Or an exposition of the different kinds of things that mathematicians study? All of these I find quite reasonable approaches, and I have and will adopt each of these when I write and speak on my beloved subject, and I’d love to write on each of them.
But for now, I wish to take a different angle. Instead, I wish to emphasize the patterns of thought and of writing that mathematicians most heavily rely on; I think adjectives like “rigorous” and “logical” are probably accurate. What I hope I can do here is to develop a series of posts which brings a clearer picture of what mathematics is about through discussing how mathematics is communicated. I have tried to bring in as few assumptions as possible o make this accessible, and I hope that the approach I take here will make certain “abstract” ideas easier to process by first explaining the framework out of which the abstract ideas arise.
Without further ado, then, we must begin by understanding the absolute, rock-bottom concept that underlies all of mathematics. Even more deeply that the idea of number itself, mathematics has as its foundation one word – truth. It may seem an odd question, but we must begin by asking what truth actually is. What does it mean for something to be true or false? This is surprisingly difficult to answer when you have to be precise, but we need not go into philosophical debates here. For our purposes, it is enough to say that a statement is true if, and only if, the information conveyed by that statement corresponds to the way reality actually is. A false statement has precisely the opposite definition. Truth is a fundamental concept, and is vital in every human endeavor. Even in the arts, truth is at the center. Take music, for instance. We know that certain combinations of musical notes sound pleasant, and others do not. This pleasantness I taught as not merely experiential, but as something true. Reality in fact is a certain way, and we tell the truth when we accurately reflect that reality in our speech.
The next step towards mathematics is the idea of an argument, the basis of the study which has the ‘fancy’ name of propositional calculus. The term ‘calculus’ really just denotes some form of calculation, and the adjective ‘propositional’ means ‘dealing with truth and falsehood.’ And when we say ‘argument’, we don’t mean angry bickering back and forth. Rather, any train of thought that is attempting to prove a point of some kind counts as an argument. So, the ‘propositional calculus’ is really nothing more than understanding the truth or falsehood of various combinations of true or false statements. For simplicity, I will give all of this the umbrella term of logic.
What then, exactly, does logic encompass? As we are beginning at the very foundations of logic, our goal is to discover what kinds of statements are undeniably true given other kinds of true statements. We aim to know in what manner we can use truths we already know to learn more truths. This study has been undertaken in depth for thousands of years, as long as intellectual discussions have been around. To begin this study, we use as an example probably the most common form of argument ever used, and one that we all know and use in daily life and thought.
Suppose you have invited a friend over to your house. You also know it happens to be raining outside today. You are in another room, and you hear your friend enter your front door and exclaim “I forgot my umbrella and raincoat!” If I were in this situation, I would immediately start thinking “Oh, my friend is wet, let me go get him a towel.” If we slow down for a bit, isn’t that strange? Your friend never actually told you they are wet. The reason we all assume he is wet is because we have an internal understanding that, if it is raining outside, then a person outside without rain gear will get wet. We combine this truth we already know with what we heard our friend tell us, and conclude that our friend must be wet.
The previous paragraph counts as an argument. We can summarize the general form of this argument by using the italicized letters P and Q to be placeholders for generic statements. The logical argument of the following paragraph has two premises, that is, two initial facts available to us. We know that “If P, then Q” and “P“, where P represents “it is raining outside” and Q represents “you will get wet without rain gear.” From these two facts, we understand that Q is also true. In the form of a list,
If P, then Q,
This is the starting point of logic – the idea of fusing together certain kinds of truth (like “If P, then Q” and “P“) to obtain another truth (“Q“). The question we now ask is, which basic building blocks can be combine to form other, related true statements? Not just any two truths will do, we must be absolutely careful in thinking about how our statements match up with reality. We are to be as careful as possible, because our goal is that whatever we say, it must be absolutely impossible that we are incorrect. This is the case with out first example: for if we know that “If P, then Q” and “P” are true statements, then then the very meanings of these statements inform us that “Q” is undeniably true.
For those of you that are interested in such things, the technical name for the argument form we have just discussed is modus ponens, Latin for “mode that by affirming affirm.”. It is called this because, by affirming “P“, we end up affirming “Q“, so one affirmation leads us to another affirmation. The next natural argument form to bring up is given the Latin name modus tollens, “mode that by denying denies.” This argument enables us to, by denying one statement, conclude that we can deny another statement. Let us reconsider your visiting friend. For simplicity, I think it is fair to say that “if it is raining, then the ground is wet.” Suppose your friend, upon entering the house, tells you that the ground is dry. In your head, you will recognize then that it must not be raining, for if it were raining, the ground would be wet and not dry. Using the letters P and Q for shorthand once again, we have the following presentation of our new argument form:
If P, then Q.
Therefore, not P.
If you take the time to think about it, this is also airtight. If (1) and (2) are true, to deny (3) would be utterly irrational. Formulating sequences of abstract, airtight reasoning is the goal of logic.
What more then can we say? This can sometimes feel like beating a dead horse, after all we can all use this kind of logic without thinking. And yet I still think it is important, as we all screw this up from time to time and don’t think through what we say, do, or believe. To end this post, I will present what are generally considered the nine fundamental rules of logic which, when combined with suitable definitions for how to understand words like ‘not’, ‘or’, and ‘and’, forms the complete basis of the fundamentals of logic, upon which more sophisticated thought can be built.
Briefly, before these are discussed, a final point must be made about the definition of truth. There are certain laws of logic, which are taken as even more basic than the ones just discussed, that I will lay out. These are called the Law of Non-Contradiction and the Law of the Excluded Middle. These can be described as follows:
Law of Non-Contradiction: A statement cannot be both true and false.
Law of the Excluded Middle: Any proposition (roughly speaking, any matter-of-fact claim) is either true or false.
These are absolutely essential to all human thought; so much so that it becomes impossible to deny either of them without assuming that they are true. (For instance, if the Law of Non-Contradiction is false, it could also be true, and then it cannot be false, but it is false… spend a few minutes trying to think through that mess!)
To close an initial discussion of logic, for those who are interested, here is a list of the 9 basic building blocks used for a rational, logical argument: (Throughout, the letters P, Q, R, and S are symbols that represent some statement, any line without the word ‘therefore’ is a premise, and any line with ‘therefore’ is a conclusion)
One can safely ignore all the fancy names; these arguments are given names just so that we can refer to them in sentences without having to write them out every time; the ideas behind them are things that can be understood by the usual meanings of the words ‘and’, ‘or’, ‘if’, and ‘then’.
If P, then Q,
If P, then Q,
Therefore, not P.
Hypothetical Syllogism (stringing together two if-then statements)
If P, then Q,
If Q, then R,
Therefore, if P, then R.
Conjunction (“True and True = True”)
Therefore, P and Q.
Simplification (If both are true, then each is true on its own)
P and Q,
(Similarly, therefore Q)
Absorption (If P implies Q, then Q can be thought of as ‘contained inside’ P)
If P, then Q,
Therefore, if P then P and Q.
Addition (If P is true, then P or Q is also true)
Therefore, P or Q.
Disjunctive Syllogism (If an ‘or’ statement is true, at least one part of it is true)
P or Q,
Constructive Dilemma (Combining two if-then statements with ‘and’)
“If P, then Q” and “If R, then S“,
P or R,
Therefore, Q or S.
To close, we ask why are these rules are considered fundamental, and other potential rules of logic not? Firstly, these rules serve the purpose of making fully clear the meanings of the terms and, or, not, and if-then. No extra rules are needed, because any other rules using these words could be built by combining these in different ways.
But more than this, there are other important logical words that are not part of this system (words like necessary, every, and some are examples). But it would be exceedingly difficult, if not impossible, to use these additional words without also using the simpler ones. So the system just described is in a sense the “smallest” system of logic. Everything else, including mathematics, is an extension of this system, with new rules and ideas added.
Transforming this 9 rule system into genuine mathematics is actually quite tedious, doing so would probably need a whole new series of posts. For my purpose here, let’s just say that when you define words like numbers, addition, every, and some to your language, you can start to call this math.
One thing that many people dislike about math is all of its special symbols. Why does it need to be so specific? Why so many symbols? Why is writing things down in a specific format so important in math? Shouldn’t something be considered “correct” if it has the right ideas, regardless of how it is presented?
These are common questions, and often come in the form of a complaint, and understandably so. There is a huge problem with people being told in school they are wrong because they got to their answers in a way that, while correct, is not “what they wanted.” There are specific situations where it is reasonable to expect a student to learn a particular method for solving a problem. But in general a correct solution should be counted as such even if arrived at in an unexpected way.
But notation is completely different. One of the primary reasons that we use all of the symbols that we do is to make the mathematical content of what we write clear and unambiguous. This may sound strange to a lot of people. I’ve heard many, many people tell me things like “math is a foreign language to me. These people might think that all these variables and symbols are too confusing to understand. Ironically, this actually reveals why notation is so important – mathematics is a language, and languages need grammar.
If someone wrote a paper in English class with really good ideas but horrific grammar, they would not get a perfect score on their paper. There is a very good reason for this – the rules of grammar help us communicate. A good piece of writing should convey information clearly, and poor grammar prevents that from happening. Too many grammatical mistakes will severely impact the quality of the writing. Math is like this too. The reason we have the symbols we do is to make what we write down unambiguous. This is also the same reason that things like dictionaries exist – imagine trying to have a conversation with a person that defines every word of English differently than you do. It simply cannot be done. It is for exactly this reason that mathematics needs to be precise.
For example, the “equals sign” = serves a role similar to a punctuation mark. It shows us that an expression is ending, and tells us that the next statement is related to the one we just read. The “plus sign” + is sort of like the word “and.” It connects things together, just as the word “and” does in English. Other symbols have other meanings. Since the + symbol can only mean one thing, “add two numbers together,” this is certainly unambiguous. That is the point.
The other common complaint is that all of the symbols are too “abstract” and difficult to understand. Every so often, the school system will change the way it teaches math for reasons like this. This is worth considering, just as from time to time we update the rules of grammar to accommodate the ways that language has shifted over time, we should be cognizant of whether a similar shift has occurred in the ways we think about math.
But any change in the way we use our notation is necessarily only very minor. Even if we make larger changes to how we compute sums, the way we express our final answer will be the same. There is a very good reason we use the sorts of symbols we do, they are not arbitrary. There was a time in history where there was no standard way to write down mathematical statements, including numbers themselves. Today, we mostly use the number system developed in Arabia, but the Chinese had their own system, and so did the Romans. Many of us are semi-familiar with Roman numerals and can read them, but imagine being asked to multiply XIII by IV. How would one systematically go about that? There really isn’t any way to, because Roman numerals are written using patterns that make computation very hard, even if reading Roman numbers is not terribly difficult.
This is why Arabic numerals won out. They are both easy to read and easy to compute with. The major innovation in the number system we use is the positional notation – which basically means the location of a number actually matters. As a consequence of this, the short list of symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be combined to form any number. Consider for instance the numbers 1234 and 4321. Even though the same four symbols were written down, the fact they are in different positions makes the numbers different. The symbol “1” is being used to mean “one” in 4321 and “one thousand” in 1234. These facts make Arabic numerals very easy to work with, and so over the course of history this system of numbers won out, because it was more convenient.
The use of variables and equations happened much the same way. For a very, very long time, all mathematics had to be done using sentences. And there was no standard way to write your sentences about numbers. To make clear what this must have been like, I will state the famous Pythagorean theorem in two ways – one where I take advantage of symbols for variables, and one where I do not.
Without variables: For every right triangle, the area of the square standing on the longest side of that triangle is equal exactly to the combined areas of the squares standing on the smaller sides of the triangle.
With variables: If a right triangle has side lengths a, b, and c, with c the largest length, then
See how much quicker the second statement is? It’s a lot less writing, but conveys exactly the same information. Now, imagine if you had to solve the following problem:
Twice a number added to seven times a second number less three times the first is equal to the second number minus four times the first number. What is the relationship between the first and second numbers?
Now imagine if you had to write a solution in paragraph format. Imagine having to simplify this without symbols! You’d have to rewrite this long sentence four or five times. This is cumbersome. But using the notation we have today, we can write
See how much easier that is? The exact same information is presented in each of these situations, but the symbols and notation enable us to do this in a very efficient manner and also enables us to navigate towards solutions efficiently. When written in words, when you want to find an answer, you have to rewrite long, complicated statements every time you make even a tiny modification. Thus, over time people who do mathematics devised new ways of writing down their ideas that were easier. And as the peoples of the world interacted on a more global scale, the best ideas of different regions were adopted. The proof methodology we use now was developed most thoroughly in Europe, but the way we write numbers is Arabic in origin, the earliest usage of 0 as we use it today is Indian.
So, mathematics is written the way that it is because it makes math easier to do. This sometimes has the unfortunate side effect of making it harder to understand to someone who is not an expert. Even mathematicians still feel this way from time to time. But despite this, the many symbols, strange words and unusual definitions used in mathematics have reasons to be as they are. These abstractions are what have made mathematics as we know it possible.
In Part 1, we have begun discussing primitive Pythagorean triples, and thought a little bit about them. Now, we want to try to characterize all primitive triples. That is now our goal.
Limiting The Possibilities
Suppose we are given a primitive triple (a,b,c). Recall that this means that the three positive whole numbers a, b, and c share no common factor and satisfy the equation a2 + b2 = c2. Our goal now is to think about these conditions, and to learn as much as possible about the three numbers. There are a lot of things that you can try (and very few people if any would come up with what I present here on their first try) but this line of thought is one thing that, with enough time, people thought to try.
For instance, it is an important fact that a perfect square, when divided by 4, has remainder either 0 (if the number is even) or 1 (if the number is odd). The reader can verify this for themselves. Using this information, keeping in mind that a2 + b2 and c2 must have the same remainder when divided by 4, we can discover that c must be odd, and exactly one of a, b is odd. Since a and b are interchangeable, we will suppose that a is odd and b even.
By manipulating the Pythagorean equation, subtracting b2 from both sides and factoring “the difference of squares” c2 – a2 = (c–a)(c+a), we can conclude that b2 = (c–a)(c+a). We now do not know about the factors of the two terms in (c–a)(c+a), but we will use a trick with fractions to get around this. Dividing both sides by b(c–a), we get a new equation
where we choose m/n to be the fraction in lowest terms. If you ‘flip’ the first and third terms in the equality above, you obtain the new equality
We can add/subtract these equations together to see that
Combining the fractions on the left of each equation using the common denominator mn and then dividing both sides by 2 gives us equations
Since a, b, and c have no common factors, the fractions a/b and c/b are already in lowest terms. Because of this, if we can ensure that our new fractions are also in lowest terms, then the tops are equal and the bottoms are equal, and we would then be able to conclude that
So, we now have a pathway to part of our answer! All we have to do now is to set up boundaries for the values of m and n within which we know that each of the three values m2 – n2, 2mn, and m2 + n2 will have no common factors. As it turns out, we can actually force this to happen. To see this, suppose a prime number p happens to be a factor of all three of these expressions, it must be a factor of (m2 – n2) + (m2 + n2) = 2m2 and of (m2 + n2) – (m2 – n2) = 2n2, and because of this must also be a factor of both m and n. But, go back a few paragraphs, when we defined m and n. We defined the fraction m/n to be in lowest terms, but we have now claimed that m and n have a common factor. As it turns out, this makes no sense. By assuming we could find a common factor, we contradicted ourselves. This is a mathematical trick called a proof by contradiction (more on this another time), but for now we need only say that this means we can take for granted that our m and n can in fact be chosen in such a way that these three values in fact share no common divisor.
We’ve now done a good deal of exploration, we have actually solved half of the problem! We have established that every primitive triple (a,b,c) can be associated with these three numbers, m2 – n2, 2mn, and m2 + n2 exactly when they have no common factor. I will leave as a problem for a curious reader the following:
Claim: The numbers m2 – n2, 2mn, and m2 + n2 have no common factor if, and only if, m and n share no common factor and exactly one of them is even (and the other is odd).
This reduces every primitive triple to an easy-to-produce formula… but we actually aren’t quite done. We can ask now the reverse question of what we just did. Instead of starting with a, b, and c, what if we start with m and n? Does our new formula always work? Or only sometimes? In fact, it will always work. To see this, we want to add together the squares of the two smaller numbers and see if it is equal to the square of the larger number. In order to do this, recall briefly the “foiling” method that gives us the formula (x+y)^2 = x^2 + 2xy + y^2. Using this, we can see that
So these are the same! The Pythagorean equation is satisfied. Our new construction can now move both ways. We can start with (a,b,c) and find the numbers m and n, or we can start with m and n and find (a,b,c). What this does now is establish the following complete answer to our original question.
Theorem: The three numbers a, b, and c can form a primitive Pythagorean triple if, and only if, these three numbers can be expressed (in some order) by the numbers m2 – n2, 2mn, and m2 + n2, where the numbers m and n have no common factors, are not both odd numbers, and where m is larger than n.
Our question has been answered, which is quite a wonderful thing, but this approach requires a lot of guesswork and toying around to discover. There is a second, more beautiful and simple solution that gives us the same answer and I think gives a lot more insight about where all of this is coming from. This second solution will come in a post of its own.
(If you haven’t read the “Problem” post with the same title, go there first. This will make more sense if you do.)
We want to find all the Pythagorean triples (a,b,c). The first thing a mathematician would probably do is to try some small examples, gather some information, and then look for patterns within that information. For instance, if you only allow a to be a number from 1 to 10 and if you let b be between 1 and 50, here’s all the triples you get:
Try to find some patterns in there. Look around for yourself…. the first thing you might notice is that some of them are really repeats – like (3,4,5) and (4,3,5) are really the same thing. So we can whittle down our list some without losing any information – in situations like this mathematicians usually choose the one where the first number is smaller, so I’ll do that, but it doesn’t really matter. Here’s the new list we got:
Then, I look again for a pattern. There definitely looks like a lot of chaos, but there is at least one more thing we can pick out. Notice that some of them are just “multiples” of others. Like (3,4,5) can be made into (6,8,10) but doubling everything, and (9,12,15) is made by tripling everything. In fact, this motivates our first piece of knowledge:
Lemma: If (a,b,c) is a triple, then so is (na,nb,nc) for any positive whole number n.
Proof: We can see by simplifying that
So, the necessary equation is true, so (na,nb,nc) is a triple. So, we are done.
(Side note: Mathematicians use the words lemma, proposition, and theorem all to mean “a true statement.” The connotation of “lemma” is that this is a smaller “piece” that helps us get to some bigger, more important thing. A proposition and a theorem are the “bigger things”, and theorems are bigger and more important than propositions. They have nothing to do with “difficulty” per se, just how important they are to the questions we want to answer.)
This is a good step. What this means now is we can reduce our list even more, to the triples where the three numbers don’t have a common factor. These have a special name, called primitive triples. Now, we hit on a big math idea – that of building blocks. With a little bit of effort, the lemma we just found basically tells us that every Pythagorean triple is either primitive or is a multiple of a primitive triple. Therefore, if we can list all the primitive triples, we actually know all the triples. Going back to our list, we reduce down to the primitive triples…
As we have just seen, our original question “what are all the Pythagorean triples?” (which from now on will just be called triples) has been reduced to “what are all the primitive triples?” This turns out to be a question which can be addressed more directly.
To see the next pattern, we will now let a be larger than b again. So, we have a list
There is now another, not-so-obvious pattern here. Look at the a’s. We have a pattern 3,4,5,7,8,9,… 10 gets skipped, and if you keep going you see that 14 is also skipped. So 2, 6, 10, and 14 get skipped. That’s a pattern, or at least it looks like one. When you look at these numbers, this might give us an idea: if (a,b,c) is a primitive triple, neither a nor b can be a number like 4n+2. This turns out to be true:
Fact: If (a,b,c) is a primitive triple, then neither a nor b can be written in the form 4n+2 for n a whole number.
Proof: This time, I’m going to intentionally leave a few details out so that anyone who is curious can fill them in.
Now, where can we go from here? We’ve started looking at primitive triples. As a mathematician, this is good progress, but we are not done yet. I encourage the reader to think some about this, and in a second post on this question I will demonstrate a method of finding all of the primitive triples.
I’ve spent some time thinking about how I want to present things I love on this blog, and one format I’ve come up with is a Problem/Solution series. The idea here is to have two posts with the same name – in one of them, I will explain an interesting problem and try to give some prodding questions about how an interested reader might play around with it. Then, in a second post, I’ll try to show how a mathematician might look at the problem, and then I’ll provide a solution to the problem.
The goal here is twofold. Firstly, I’d like to give people an opportunity to try things for themselves. Think of this like a puzzle – you don’t have to know much math to give things a try. Full answers often take a very long time to find, but even finding part of the answer can be very gratifying. The second reason I like this format is that I can use it to give anyone reading a “look into the mind” of a mathematician. I’ll build things up one step at a time, hopefully shedding light on important ideas along the way and showing how the math really does have a story behind the scenes. I’d strongly encourage playing with problems on your own before reading my answer. You’ll get a lot more out of it that way, even if it’s just for 5 minutes.
Anyways, on to the first problem! For the math people out there, the “name” of the problem I will be laying out is the classification of all Pythagorean triples. If you don’t know what that means, don’t worry about it. It will be explained.
In a previous post (So What is a Proof?), I showed why the Pythagorean Theorem is true. So, we now know that now. But we can keep asking questions. For example, we now have an equation about right triangles:
Now, like with any other equation, it is natural to try to find some solutions. We can choose any values for a and b that we want, and then ask what value c is. Well, we can do a “square root” on both sides of the previous equation, and now we get
We might ask what sort of number c is, and we then realize that c is the distance between two points on that triangle, and so we now have learned how to calculate distances! If you are into geometry, you might want to know if there is a version of this for triangles that do not have 90 degree angles (there is, it’s called the Law of Cosines for those who are interested). We might have noticed that we can divide both sides by c squared and rearrange a bit to get a new equation
Now this looks like a question about fractions. Well, we never actually said that a and b had to be whole numbers, so maybe it isn’t really a fraction… but what if it were? This is the question that comes most naturally to me, because I am fascinated by whole numbers. Might I be able to find some whole numbers that make this equation work?
This isn’t immediately obvious one way or the other. Equations can be rather unpredictable, especially when you want every number to be a whole number. I’ll give an example to show what I mean. One of the questions related to this which people began talking about a long time ago was “What happens if you change the 2 to something else?” So what about an equation like
or any other value in place of 2… maybe those have some whole-number solutions too? As we will soon see, when we left the exponent as 2, there are plenty of whole-number solutions. What took hundreds upon hundreds of years to discover is that if you change the exponent to 3, or 4, or 5, or any whole number larger than 2, the number of solutions drops to ZERO. (Well, except for if you make a=0 and b=c, say, but that’s pretty boring, and when the exponent is 2 there are some not-so-boring examples.) For those who are interested, this is called Fermat’s Last Theorem, and we have only known this amazing fact since 1996. There are tons of good books and YouTube videos on this topic, I’d highly recommend surfing a few.
The main point of that excursion is just to say that equations don’t always play nice. Weird things happen. Even with the regular equation, most of the time you don’t get whole numbers. But what about this equation? What kinds of answers does it have? Now, I can say why I described the problem the way I did. If we find three positive whole numbers (a,b,c) that make the equation
true, we call (a,b,c) a Pythagorean triple (since it’s three numbers that satisfy Pythagoras’ Theorem). Another way we could ask this question might be what are all the right triangles where every side is a whole number of units long?
Try it yourself! If you check out the post with the same title with the word “Problem” with “Solution”, if I’ve posted it yet, you’ll be able to see a complete answer, along with all the steps a mathematician might take to get there.