# COVID Testing and Bayes’ Theorem

Yesterday, an interesting conundrum came to me. Sometimes, people take two COVID tests on the same day. Imagine that one came back negative and the other came back positive… which can and does happen. Here is the tricky question… which one do you trust? The positive or the negative? You might think there is no way to compare them… but you’d be wrong. Mathematicians have an entire theory for dealing with problems exactly like this one. It is called Bayesian probability theory.

What is Bayesian Probability?

There isn’t any need to go into the incredible depth of what Bayesian probability is capable of, and so we won’t. Actually, all we need is a brief account of what Bayesian probability is meant to do and a brief explanation of how to do it. I can then walk you through how I, with my mathematical training, could go about evaluating which of the test results is probably right and which is probably wrong.

In short, Bayesian statistics or Bayesian probability is a theory of how evidence works. More specifically, Bayesian probability theory explains how evidence and probability interact and much influence new evidence should have on your current beliefs. This theory allows us to analyze which pieces of evidence are more important than others and how to correctly add new information into the overall picture.

I will first use a simplified example to show how the theory works. Suppose I ask you whether or not it will rain tomorrow in your hometown. Since (presumably) you’ve lived in your hometown for a large portion of your life, you have a pretty good idea of the likelihood that it will rain on a randomly chosen day. If you have to answer my question without looking up the weather report, you will probably give me that random probability. This random probability is called the prior probability (it is called this because it is prior to looking at any evidence). Your knowledge about your hometown’s average weather patterns is called the background information – it isn’t really evidence, it is rather the overall backdrop that you use to consider your evidence. Now, suppose you open up a weather app. You are now gathering evidence. In Bayesian probability, evidence just means “anything not already in your background information.” Evidence can point either direction – in fact you can have two pieces of evidence that point in opposite directions. For instance, imagine you open your weather app and you see a picture of a raincloud. Well, that evidence would serve to increase the likelihood that it will rain tomorrow. To be more specific, it doesn’t really impact whether or not it will rain tomorrow… obviously. But it should cause you to shift your opinion towards answering yes rather than no. The new probability that you have after this evidence is called the posterior probability (so called because it is post-evidence, that is, after looking at evidence). But evidence isn’t final in this theory. We can always find new evidence and factor that in. Perhaps, for instance, you look closer and your weather app says there is a 30% chance of rain tomorrow, and you know from living in your hometown for years that these low numbers do usually mean it won’t rain. Now, you received a new piece of evidence that should again shift the way you think about the question I asked you.

This is the natural way we process evidence. We all understand that seeing the raincloud on the weather app does give us some reason to think it will rain tomorrow, and we all understand that the 30% is an even better piece of evidence that, although it might rain tomorrow, it probably will not. Notice how after every new piece of evidence, the way we think about the situation changes. If you were to look at a different weather service and saw a 70% chance of rain, then your opinion should shift to account for that new information. When a mathematician, scientist, or philosopher talks about Bayesian probability or Bayesian statistics, all we mean is that we are using a mathematical theory that helps us keep all of our information up to date with the broad scope of evidence available to us.

Why Use This Stuff Anyways?

You’d be very surprised how many people us Bayesian thinking. I don’t think many of us would be surprised to see a scientist or mathematician using this method of thinking… but how about historians, philosophers, and detectives? Well, they do! Whenever you hear somebody speaking about the best way to explain something or the most likely explanation of something, very often they are using a process something like Bayesian probability. The strength of this way of doing mathematics is that we can directly compare pros and cons. In the weather example, we can figure out whether there being rain or no rain tomorrow is a better explanation of why we see the forecasts that we do. In history, if you have sources that give you slightly different information about a historical event, you want to know which historical events would be the most reasonable way to explain how those differences emerged. As a philosopher, you want to understand which ideas fit better with the observations we can all agree on about the world. As a detective, you want to know which suspect better fits with the evidence.

Notice in all of this that I have never once required that the match be perfect. Bayesian probability cares nothing for perfection. This method is designed specifically to deal with imperfect, messy situations. And that is what we are always dealing with, day in and day out.

How Does the Math Work?

The cornerstone of Bayesian probability theory is called Bayes’ Theorem. Before I can state what this is, I need to introduce the notation we use to write it down. When I use the capital letter $P$, this is always referring to the probability of something or other. Capital letters other than $P$ are used as shorthand for some event we care about – like whether or not it will rain. For example, I might use $A$ as shorthand for “it will rain tomorrow,” and then $P(A)$ would be the likelihood/probability that it will rain tomorrow. If I want to talk about the opposite of something, we can just put the word ‘not’ on the front – so $P(\text{not }A)$ is the likelihood that it will not rain tomorrow. (It is sometimes convenient to use a minus sign or some other symbol instead of the word not, but I won’t do that here.)

The last bit we need is called conditional probability. The notation I will use is $P(A|B)$, which you should read as “the probability of $A$ given $B$.” The key word there is given. When we ask about $P(A|B)$, we aren’t just asking about the likelihood of $A$ – we are asking about the likelihood of $A$ given that we already know $B$. This is the aspect of Bayesian probability theory that makes it possible to take new evidence into account. In the weather example, $B$ would be the weather reports we are looking at.

With all of this written down, I can now express Bayes’ Theorem. Before I write it in symbols, I’ll describe it in words. The goal is to calculate $P(A|B)$. The idea is that we can use the fact that we already know $B$ to get a head start. Since we already know $B$ happened, we can eliminate from consideration all possible situations where $B$ doesn’t happen. What we do now is we flip the order of the letters for a bit. We want to ask if $A$ would have made $B$ likely to happen – in other words, we want to know $P(B|A)$. If we also know $P(A)$, the likelihood of $A$ happening at all, then $P(B|A)P(A)$ basically tells us the proportion of situations where both $A$ and $B$ happen. Likewise, $P(B|\text{not} A)P(\text{not}A)$ tells us the proportion of situations where $B$ happens but $A$ doesn’t. Since $A$ either happens or doesn’t, those are the only two possibilities. Then to compute an overall probability, we pick the case we want to know about – the case where $A$ actually happens – and divide that by all the possible situations – that is just $P(B)$ since we already know $B$ happened. What we’ve done is leveraged the fact that we know $B$ has happened already to figure out $P(A|B)$. In terms of formula, the preceding discussion tells us what we wanted to know.

Bayes’ Theorem: For any two events $A$ and $B$, $P(A|B) = \dfrac{P(B|A) * P(A)}{P(B)}.$

To see how this works, let’s go back to our weather example. In that example, $A$ represents “it will rain tomorrow” and we will say $B$ represents “the weather forecast says it will rain tomorrow.” The values of $P(A)$ and $P(B)$ are found using background information – so $P(A)$ would be the likelihood it will rain on a random day, and $P(B)$ would be the percent of days that weather services predict rain in your town. You could count up a few months of old weather predictions to find $P(B)$, and you could count up a year’s worth of actual rainy days to find $P(A)$. Now, $P(B|A)$ would be the likelihood that the weather service would predict rain if in fact it will rain tomorrow. That should be a reasonably high number. Let’s say, just to put numbers to it, that $P(B|A)$ is 90% – so on rainy days, weather services had predicted that rain the day beforehand in their forecasts 9 out of 10 times. Let’s say that in your town it rains on 15% of days overall and that your weather service predicts rain on about 18% of days overall.

Then $P(A|B)$, the probability that it actually will rain tomorrow given the forecast, is $P(A|B) = \dfrac{P(B|A)*P(A)}{P(B)} = \dfrac{0.9*0.15}{0.18} = 0.75.$

This means that, three times out of four, it actually will rain tomorrow. Notice the cool thing about this – I never actually used the weather service’s posted probability to do any of this. I came up with the likelihood entirely on my own. This is the power of Bayesian probability.

Back to the original situation now. You’ve taken two back-to-back tests. One is positive, one is negative. What to do? Bayesian probability, that’s what!

Remember that the first step to any Bayesian problem is to set up your background information. The first piece of background information would be where you live. You’d want to estimate the probability that a random person near you currently has COVID-19. You could use your city, the area of your city you live in, or perhaps just your college campus if you live on a campus and never leave. I will call the probability of a random nearby person being sick $p$. The second piece of background information is the quality of the test you use. The relevant term is the specificity of the test – which tells you how likely the test is to give you a negative result if you are truly no sick with COVID. I will call this factor $\sigma$. Notice that the specificity is phrased in Bayesian terms already – it is the likelihood that your test will come back negative given that you are not sick. We will use this later. Lastly, we need to know the rate of positive tests in the area you live in – I’ll just call this $q$. To make everything easier to write down, I will use $+$ as shorthand for testing positive and $-$ as testing negative.

We now carry forward. What we want to know is the likelihood of obtaining a certain test result given your actual health condition. There are four probabilities that need to be calculated: $P(+|sick), P(+|well), P(-|sick)$, and $P(-|well)$. Now, we call on $\sigma$, the specificity of the test. This is the probability that you will test negative given that you are well. We’ve been using the notation $P(-|well)$ for this. This means that $P(-|well) = \sigma$. Now, three to go.

We can now use a principle of probability that we mentioned earlier. Either an event happens, or it doesn’t. Pretty simple. In probability language, this means opposite events add up to 1. Since $+$ is opposite of $-$, $P(-|well)$ is opposite to $P(+|well)$. This means that $1 = P(-|well) + P(+|well)$. We now know that $P(+|well) = 1 - \sigma$. Now we know two out of the four.

This is as far as we can go without using Bayes’ theorem. I’ll now use Bayes’ theorem to calculate $P(well|-)$. The theorem tells us that $P(well|-) = \dfrac{P(-|well)*P(well)}{P(-)}.$

Now, we need to fill in the blanks. We already know that $P(-|well) = \sigma$. Our background information gave us the value of $P(well)$, which is $1-P(sick) = 1-p$. The background information similarly tells us that $P(-) = 1 - q$. Therefore, $P(well|-) = \dfrac{\sigma (1-p)}{1-q}.$

We now use the same “adding up to one” trick we used before to see that $P(sick|-) = 1 - \dfrac{\sigma (1-p)}{1-q}$. This trick has brought the “sick” category back into the formula, which is what we needed. We now need Bayes’ theorem again to calculate $P(-|sick)$. Using the background information much the same way as before, $P(-|sick) = \dfrac{P(sick|-)*P(-)}{P(sick)} = \dfrac{\bigg( 1 - \dfrac{\sigma (1-p)}{1-q} \bigg) * (1-q)}{p} = \dfrac{(1-q) - \sigma(1-p)}{p}.$

Notice that if we use the original meanings of $\sigma, p, q$, then this means that $P(-|sick) = \dfrac{P(-) - P(-|well)P(well)}{P(sick)}.$

This other way of forming the expression is easier to read. To find $P(-|sick)$, we count all the negative tests, subtract away those people who are actually well, and divide by the number of sick people. This is actually pretty intuitive when you slow down and think about it – we can describe the whole process in terms of counting people – and yet look at how far it has gotten us! As before, the fourth number, $P(+|sick)$, can be found using the “adding up to 1” trick.

We now have all the numbers we need: $P(-|well) = \sigma,$ $P(+|well) = 1 - \sigma,$ $P(-|sick) = \dfrac{(1-q) - \sigma(1-p)}{p},$ $P(+|sick) = 1 - \dfrac{(1-q) - \sigma(1-p)}{p}.$

Now what? Well, the likelihood of your test results if you are actually well is the product of all the probabilities given that you are well. In Bayesian terms, this is $P(-|well)*P(+|well) = \sigma(1-\sigma).$

Likewise, the likelihood of your test outcomes if you are sick is $P(-|sick)*P(+|sick) = \dfrac{(1-q) - \sigma(1-p)}{p} \bigg(1 - \dfrac{(1-q) - \sigma(1-p)}{p}\bigg).$

We are essentially done now. All that remains is to compare the two values. Whichever one is larger is the more likely of the two.

This is about as far as we can go with these simple probability methods. But, there is more to be said. First, we might notice that if we treat $\dfrac{(1-q) - \sigma(1-p)}{p}$ as a single unit, perhaps call it $s$, then the likelihood of your outcome if you are actually sick is $s(1-s)$. We can view this whole problem now in terms of values of a function, $f(x) = x(1-x)$. Using a different part of mathematics called calculus, we can discover that the biggest possible value of this expression is precisely $1/4$, which happens when $x = 1/2$. We also learn from calculus that the closer $x$ is to $1/2$, the bigger $f(x)$ is. This means that all you really need to do is to find $s$ and figure out whether it is closer to $1/2$ than $\sigma$ is.