# Critical Thinking Toolkit: Weighing Evidence with Bayes’ Theorem

When we are involved in important discussions, it is important to not take pivotal claims of others – or ourselves – merely at face value. Instead, we debate the various available ideas. That is, we bring forth evidence that we have thought about and that we believe supports our position, and we listen to the evidence presented by those who disagree with us. We ask those who disagree with us difficult questions about what they believe, and they will ask us difficult questions about what we believe. In all of this, the big question lies in trying to evaluate which of the two positions is better.

This is especially important if we are undecided on some idea and we are listening to two intelligent people express their opinions. Suppose, for example, that Jim doesn’t have any developed opinion on an important political topic in the upcoming presidential election. The most responsible thing Jim could do would be to seek out the best and most intelligent proponents of the major positions within his country on that issue (note that this might not be the candidates themselves) to hear the best evidence that each perspective has to offer. How is Jim to decide which position to take?

Of course, in this situation a lot depends on what the issue is. I don’t claim to provide an analysis that will work in absolutely every situation. To be more specific, I don’t claim to explain in detail what kinds of evidences are valid or invalid in different situations. However, as a mathematician, I can provide insight on a precise way to evaluate evidence once you have it – a method that is based on foundational concepts in the mathematical theory of probability.

Why Apply Probability to Decision-Making?

The first question we might ask is why we bring probability into the picture at all. You might say that since these questions have right answers, all the probabilities are either 100% or 0%. In a sense, this is correct. However, for very nearly every question we ever think about, arriving at this all-or-nothing situation would require almost being omniscient, and it would require being absolutely free of all psychological bias. A little time spent learning about psychology will debunk the idea that we are unbiased – each and every one of us has a lot of bias. And I hardly even think I need to justify the claim that none of us know absolutely everything there is to know about a given often-debated question – that just isn’t possible. So, while it is “technically correct” in one way that everything will reduce down to 0% or 100%, it is entirely unhelpful to think of things in this way, because none of us can actually arrive at a truly 100% answer to any complicated question.

What then do we do? One extremely helpful thing to do is to bring in probability theory. In mathematics, probability theory is the discussion of how likely things are and the proper ways to determining how likely something is to be true (among other related ideas). Probability theory is a highly developed and sophisticated field – you can get Ph.D’s studying tiny slivers of the broad landscape of probability theory. Fortunately, we don’t need all that complicated stuff – the most important foundational tool of probability theory that we need to understand for these kinds of debates doesn’t use any horribly complicated ideas. The tool that I am thinking of here is called Bayes’ theorem, which serves as the foundation of such disciplines as Bayesian statistics. Before I explain how this affects how we ought to view evidence in day to day life, let me spend a moment developing Bayes’ theorem itself.

What is Bayes’ Theorem?

In probability theory, the most important notation of all is the way we denote probabilities of various things. Usually, instead of writing out full sentences to describe the situations we are about, mathematicians use a one-letter shorthand for those situations. Here, I will use $A$ to denote some kind of claim that someone makes to you. In probability theory, when we want to talk about probabilities, we define $P(A)$ to the probability – ranging anywhere from 0 to 1 – as the likelihood that $A$ is true. For example, if $A$ is “the coin I just flipped landed on heads”, then $P(A) = \frac{1}{2}$, which means the probability is 50-50. If instead $A$ is the odds of rolling a six on a standard 6-sided die, then $P(A) = \frac{1}{6}$.

Another important idea in probability theory is the introduction of the word “not” into the vocabulary. In the context of the 6-sided die, I think it is clear enough that either I will roll a six, or I will not roll a six. Since one of those two absolutely must happen, the total probability must be 100%, or $P = 1$. Since it is impossible that both of these things happen at the same time, probability theory dictates that

$P(A) + P(\text{opposite of } A) = 1.$

If we are in Jim’s situation from earlier and are trying to decide whether a particular claim by someone else (let’s call it $A$) is true or false, we care about both of the values $P(A)$ and $P(\text{opposite of } A)$. Often, in mathematics instead of writing out ‘opposite of $A$‘ we instead use $A^c$. We think of the letter $c$ as a stand-in for the word ‘contrary,’ so $A^c$ means something like ‘contrary to $A$‘, or just ‘opposite of $A$.’ The equation previously defined might then be expressed as $P(A) + P(A^c) = 1$.

We now run into a harder question – how does evidence factor into the equation? It is certainly true that the value of $P(A)$ is related to how much evidence we have available to us and how convincing we find that evidence. Probability theory also has a way to write this. Say that $A$ is the thing that you want to know about, and $E$ is the total of evidence you have that relates to $A$. This is all evidence – evidence for $A$ being true and for $A$ being false. When a probability theorist wants to express the likelihood of $A$ being true based upon the evidence $E$ available to us, they write $P(A | E)$, which is read out loud as the probability of $A$ given $E$.

When we write things in this way, how then do we determine $P(A | E)$? This is where Bayes’ Theorem comes in. This is a mathematical theorem that tells us how to compute $P(A | E)$ using other probabilities. The idea is as follows. If we want to know how likely $A$ is given evidence $E$, we want to look at how $A$ increases or decreases the likelihood of finding $E$. In other words, we want to know whether how well $A$ happening explains why we find $E$, and we compare that to how well alternatives to $A$ explain why we find $E$. The mathematical statement of Bayes’ theorem is given below:

Bayes’ Theorem: The following formula is true:

$P(A | E) = \dfrac{P(E | A) P(A)}{P(E)}.$

To see how this works, say that we are trying to determine whether $A$ or $B$ happened, and we are using the evidence $E$ available to us to try to decide. Then we can use Bayes’ theorem to calculate $P(A | E)$ and $P(B | E)$, and whichever of these options is larger is the more probable explanation.

Of course, in most situations it doesn’t make a whole lot of sense to use actual numbers, because in the vast majority of situations, using actual numbers is an oversimplification. However, the theorem still provides the correct conceptual framework within which to think about evidence. To see why it is helpful, it helps to use an example.

Winning the Lottery

Let’s say you bought a lottery ticket. You obviously are excited and are hoping to win – let’s use the shorthand $W$ to denote your ticket being the winner. Now, we can imagine thinking about our likelihood of winning the lottery in four different situations. In all cases, suppose your odds of winning are one-in-a-million exactly. (Note: The first situation is labelled ‘Situation 0’ since nothing really happens there)

Situation 0: All you know is that you have a ticket and 1-in-a-million odds. Use $E_0$ as shorthand for the evidence available to us in Situation 0.

Situation 1: Your friend tells you they think you won! You ask why, and they say, “Oh, I just have a really good feeling about it.” Use $E_1$ as shorthand for the evidence available to us in Situation 1.

Situation 2: Your friend tells you that they saw the lottery number in the news, and they remembered a few of the numbers on your ticket and that you got those right, but they don’t remember the rest of your ticket. Use $E_2$ as shorthand for the evidence available to us in Situation 2.

Situation 3: You watch the news and see the lottery ticket announced, and you check your ticket and the numbers all match! Use $E_3$ as shorthand for the evidence available to us in Situation 3.

One important thing to note upfront about all of this is that $P(W)$ is the same in all four situations – it is exactly one chance in a million. But, as we learn more and more about the winning ticket number, the way we think about our chances of winning changes. Here is how one might think about the various ways of calculating $P(W | E)$ for these situations.

• In Situation 0, we have no evidence at all, so $P(W | E_0)$ is just $P(W)$. Since we don’t have any evidence, nothing changes.
• In Situation 1, the evidence from our friend is pretty unconvincing. But, there is a slim, slim chance that they already know you’re going to win for some reason and they are just trying to conceal their excitement. This is a very unlikely scenario, but it is possible, so $P(W | E_1)$ is a miniscule amount larger than $P(W)$ – but it doesn’t really make that much of a difference.
• In Situation 2, we have some concrete information from a fairly reliable source. We don’t have enough to know whether we won or not, but $P(W | E_2)$ will be much larger than $P(W)$, but will still be pretty small. Perhaps it will be something around 0.1% or so – much better than the 0.0001% we started with, but still not particularly good.
• In Situation 3, the if we read the TV screen correctly, and if this was really a new channel, then we definitely won the lottery. So, $P(W | E_3)$ is almost all the way up to 100% – the only doubts left would be whether we read our ticket correctly. Perhaps it is about 99.9%.

In terms of Bayes’ theorem,

$P(W | E_0) = \dfrac{P(E_0 | W)P(W)}{P(E_0)} \approx \dfrac{1*P(W)}{1} = P(W),$

$P(W | E_1) = \dfrac{P(E_1 | W) P(W)}{P(E_1)} \approx \dfrac{1*P(W)}{\text{number barely less than 1}} = P(W) + \text{very small number}.$

$P(W | E_2) = \dfrac{P(E_2 | W) P(W)}{P(E_2)} \approx \dfrac{1 * P(W)}{0.1} = 1000 P(W) = 0.1%,$

$P(W | E_3) = \dfrac{P(E_3 | W) P(W)}{P(E_3)} \approx \dfrac{0.999 * P(W)}{P(W)} = 99.9%.$

Something that is also worth noting here is that nothing particular “extraordinary” was required to leap from 0.0001% all the way up to 99.9%. All that happened is that we watched the news, and the fact that the news announced the correct lottery number is nothing surprising at all. This serves to debunk the claim that we sometimes see that “extraordinary claims require extraordinary evidence.” This is a false mantra, because it neglects the impact of Bayes’ theorem.

Conclusion

When thinking about how evidence affects the likelihood of various beliefs, it is important to remember the balance that Bayes’ theorem gives us. Every idea must be thought of as likely or unlikely not in a vacuum, but in light of the available alternative ideas.