How Does Disease Spread?

These are difficult times. Because of the spread of the COVID-19 virus, we are all taking very drastic measures to ‘flatten the curve’ by socially distancing, increasing our awareness of hygiene, and many other measures. These, of course, should be done, because lives will be saved. But how, exactly, do we know this? We don’t quarantine for every disease, even something as horrible as Ebola never led to any measures as drastic as those we now face. The question I want to address here is why this is so – how do we know, mathematically, when things are getting out of control? And how do mathematicians understand and calculate the way diseases spread? I will be trying to explain some of the basics of how this virus spreads.

I will also add some additional resources that I really like that discuss mathematics in ways that I find very helpful. In particular, I highly recommend the videos on the YouTube channel 3Blue1Brown. This channel does an amazing job with visual displays, far better than I could every do, and very effectively explains the concepts behind the models.

Mathematical Modelling

Throughout this article, the concept of a mathematical model is absolutely central. A mathematical model is created by studying real-world patterns of something (disease spread, economic growth, population growth, gravity, anything at all really) and then attempting to capture the patterns we see using equations that mathematicians know have some of the same patterns.

The reason we make models is because it enables us to make some predictions about the future. In many cases, we can actually learn more about the real-world events we are modelling than would have been possible by mere observation. Many of the sciences, especially physics, chemistry and biology, very frequently use of mathematical models to aid in understanding their fields.

Models can be extremely different from one another, as the model depends vitally on what is being modeled. Since this is so, we really will only focus on the ideas that play an important role in modeling the ways that diseases spread.

Exponential Growth

The first thing to talk about is the exponential growth of diseases. Broadly speaking, when we say that a number is growing exponentially, we mean that the speed at which this number is growing is directly related to the number itself. The most common example of this kind of growth that we experience lie in population growth. The number of people in the next generation depends directly on how many people there are now, and you would find the number of people in the next generation by multiplying the current number of people by the average number of children per person.

Let’s construct an example using some numbers that will be easier to calculate with. Suppose that an isolated island has a population of just two rabbits, and that every year, every pair of rabbits have twins. After one year, there would be four rabbits. After two years, each of the two pairs has twins, and so there are now eight rabbits. When you continue the pattern, every year the number of rabbits doubles – the pattern becomes

2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, …

You could have some rate other than doubling. For instance, you could have a model where the number of rabbits triples every year, and you’d then have a new pattern

2, 6, 18, 54, 162, 486, 1458, 4374, 13122, 39366, 118098, 354294, …

Something important to notice here – both of these lists go up to 12 years. Notice how massive a difference we have from doubling to tripling, we have gone from about 4000 rabbits to more than 300,000! This discrepancy keeps getting worse and worse as time goes on. To see why, we can describe both of these growth models with equations. Since in these lists, we obtain each entry by multiplying the previous entry by a growth factor (2 and 3, respectively, for doubling and tripling), our lists actually look like this:

2, 2*2, 2*2*2, 2*2*2*2, … and 2, 2*3, 2*3*3, 2*3*3*3, 2*3*3*3*3, …

In equation form, after n years, the first list has entry 2 * 2^n, and the second list has entry 2 * 3^n. If we want to know the discrepancy from year to year as a percentage, we can calculate the ratio of the two:

\dfrac{2 * 3^n}{2 * 2^n} = \dfrac{3^n}{2^n} = \bigg( \dfrac{3}{2} \bigg)^n = (1.5)^n.

If we plug in n=1, for example this ratio is 1.5, which means that there are 50% more rabbits in the second list than in the first list. In the calculation I made earlier at year 12, the ratio is (1.5)^{12} \approx 129. This ratio will get bigger and bigger as the number of years grows.

Why Exponential Growth Matters for COVID-19

When we model a virus, exponential growth plays a front and center role. The reason for this is the way viruses spread – namely, from person to person. To oversimplify, if a disease spread in such a way that every sick person infects one other person a week, and that right now there are two people sick with this disease, the list of the number of sick people after n weeks is the same list for the doubling of population of rabbits:

2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, …

This is because after one week, the initial two sick people each infected one person, and that means two new infected people, for a total of 4. The next week, each of the 4 infects one new person, so we have 4 new infected people for a total of 8 infected people, and so on.

This is the nature of how viruses spread. We can use this idea to develop our very first mathematical model of a virus (which will need to be improved later). If a sick person infects c people on average every week, and if there is currently just one sick person, then the equation y = c^n tells us the approximate number of people that will be sick n weeks from today. If there are currently A people sick, we can modify this equation to y = A * c^n.

Making the Model Better

It is important to point out that this is an oversimplification. If you look at real-life instances of disease spread, you will find that the equation y = A * c^n does a pretty good job of predicting things at the beginning, but becomes wildly inaccurate later on. This is because, eventually, the disease begins to decrease, but an exponential model never does. This is the right place to begin, but we need to make some changes. As a mathematician would, we must ask ourselves why it is that diseases eventually slow down, and once we know why, we can modify our equations.

There are two big errors we have made that need to be accounted for in order to make our model more accurate. The first thing we didn’t think about is the fact that people with a disease will either recover or die, and in both cases they are no longer contagious. This will decrease the overall number of contagious people, and so will slow down somewhat the exponential increase. However, it turns out that even when you account for this, you still have a kind of exponential increase. Suppose for example that every sick person is sick for 2 weeks and infects one person per week. Then if we start with one sick person, after 1 week there will be two. During the second week, new two people are infected, but one person recovers, and so we now have three infected people. On the third week, then, three new people are infected, and the one person who became sick during the first week recovers, and so we are now at 5 infected people. After the fourth week, there will be 5 new infections and 2 people recovered, and so there will be 8 infected people. In general, the value for the number of infections after n weeks is obtained by adding together the values from the two previous weeks. The result of this is the famous Fibonacci sequence:

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, …

This is an improvement, but this is still too fast – mathematicians know that the Fibonacci sequence grows very much like the exponential function (1.6)^n. So, there must be something else we are missing.

The second detail we are missing is that there are only so many people out there, and that once people are no longer contagious, they don’t get sick again. This detail is really what reigns in the growth of a disease. It will take a little bit of time to factor in these into our equations for the virus.

The SIR Model of Disease Spread

Using the modifications we have just made, we arrive at the SIR-model for disease spread. SIR is an acronym for the 3 ‘types of people’ that the model takes into account. S stands for the susceptible people – those who have not yet become sick. The I stands for the infected people, those who are currently contagious (even if they are not showing symptoms). The R stands for the recovered people, by which we mean people who have had the disease and are no longer contagious. Notice that this would include people who have died, since they are no longer causing others to be infected.

The term ‘recovered’ is then a little misleading. You can make more complicated models that take into account the real differences between those who die and those who make a full recovery – namely that those who die from a disease likely are in a hospital, and for this and other reasons likely cause fewer people to get infected than someone with only a mild case. For the sake of simplicity, we will ignore this difference. A mathematician working on a problem like this would take into account way, way more factors than we have included here, and would probably create more categories as well (like people who are infected by haven’t shown symptoms yet), but as it turns out the SIR-model will give us pictures basically identical to the curves we are supposed to be flattening, and so we won’t make things any more complicated than we need to.

The Mathematical SIR-Model

In order to write down this model, we need to define some notation. First, we will have our three variables, S, I, and R, as defined earlier, to count the susceptible, infected, and recovered people. We will also need two constants a and b. The constant a will be a measure of how quickly the disease is transmitting from person to person, and b will measure how quickly people recover from the disease once they get it.

We are modeling how quickly the disease is spreading, so we need a way to write down the ‘speed’ of a variable instead of just its value. The way we usually write this in mathematics is S^\prime (read “S prime”). The variable S^\prime will measure how quickly S is changing, I^\prime measures how quickly I is changing, and R^\prime measures how quickly R is changing.

Now, we can build our model with these variables. In order to make sense of the model, we first will describe the conceptual process that we are modeling. In the SIR-model, there are only two types of changes we must considered – we care about how many susceptible people become ill, and how many ill people recover. You can picture this process with the diagram below:

\textrm{Susceptible Population} \longrightarrow \textrm{Infected Population} \longrightarrow \textrm{Recovered Population}

If we can understand how both of the arrows work, then we can build all of our equations out of this diagram. The second of the two arrows is the easiest to grapple with. If we have a population of people infected by a disease that normally lasts for two weeks, then on an average day 1 out of 14 of them will recover. This recovery rate is exactly what we meant by the constant b earlier, and this constant will depend on how long the disease lasts, as well as on how successful the medical system is at speeding up recovery. So, on a given day, a certain percentage of all infected people move from the I category into the R category. This percentage is measured by b, and the number of people will be bI. So, the I \to R has associated with it the value bI. Since the process of recovery will cause I to go down, we will have to subtract $bI$ from $I^\prime$ (going down = negative change = negative I^\prime value). For the type of reason, we will have to add bI to the R^\prime value.

The first arrow is a bit more complicated. There are three primary factors to consider in how many people move from being susceptible to being infected. Firstly, we require the ‘transmission constant’ a from earlier, which indicates how contagious the disease is from person to person, as well as how much person to person interaction there is in public. We can also observe intuitively that larger populations mean more people will get sick and that more infected people will increase the ‘number of opportunities’ people have to become sick. When you factor all of this in, the process of S \to I will have a value of a*S*I associated with it. For the same reasons as in the previous paragraph about increases and decreases, we will subtract this value from our S^\prime equation and add it to our I^\prime equation.

We have now arrived at the SIR-model, whose equations are now listed below:

S^\prime = - a*S*I, \ \ \ I^\prime = a*S*I - b*I, \ \ \ R^\prime = b*I.

To give a verbal interpretation of all of this, the equation S^\prime = - a*S*I means that on a daily basis, the number of people who are susceptible has decreaesd by a*S*I, the number of sick people who are no longer sick is b*I, and the number of people who are currently sick has changed by the value a*S*I - b*I, which could possibly be positive (more sick people) or negative (fewer sick people).

What Does SIR Predict?

Now, the curves given to us by the SIR-model are in fact the curves that we have been seeing everywhere in the news. Here is an image of how all three variables change in a certain situation.

In this image, Blue = Susceptible, Green = Infected, Red = Recovered
This image is listed as public domain, link at end of article

The curve we have been seeing the most of is the green one. This is a familiar shape to us now, this is the curve we are supposed to flatten.

What is ‘Flattening the Curve’ Mathematically?

The constant press to ‘flatten the curve’ is everywhere now. But what does this actually mean in terms of the model, and how do we accomplish it?

One essential property of a model is that the shapes of the graphs are totally determined by the initial conditions (the ‘start values’ for all of the variables) and the value of the parameters. In order to change the shapes of the curve, then, we must either change the initial values of S, I, R or we must change the constants a,b. In the case of a disease, the initial values of the variables only have one option – one person infected, none recovered, and everyone else susceptible. That’s it. So, if we want to flatten to curve, we must do something to alter the values of a,b. So, what factors contribute to these constants?

We will talk about b first. Remember, b measures how quickly contagious people recover, which means that the value of b is determined by two things primarily – the underlying biology of the disease, and the medical technology available for treating it. For bacterial infections, developing and administering antibiotics can drastically reduce the value of b. For a viral infection like COVID-19, however, the underlying biology does not allow this. While there are things that the medical system can do to decrease b for a virus, it is much more difficult.

And in any case, changing the value of b doesn’t actually flatten the curve in the same way that we are seeing in the media. When you change b, it does decrease the peak, but there isn’t not any noticeable ‘spreading out’ effect. Of course, lowering the peak means fewer people will be infected, which is wonderful, but the goal of our quarantining is to not only lower the peak, but also delay the peak so that our medical systems have as long as possible to prepare. Changing b does not delay anything, but changing a does.

The constant a has many, many more factors which affect it, and so it is much easier to change. This constant tells us the speed at which the disease is spreading from person to person. Part of this is of course determined by biology – a disease that can spread through sneezes and coughs will have a larger value of a than one that cannot. But the ways in which we interact with one another can also decrease or increase a. The more we physically interact with one another, the larger the value of a. The more socially distance and wash our hands, the smaller the value of a. When you study the graphs on a computer, you can see that when you change a, this not only lowers the peak but also delays it. This is what we want to do.


Unfortunately, I don’t really have the time or knowledge to embed into this article an actual visual demonstration of how the curve flattens. However, I will provide links to websites that have this feature, as well as some very well-made YouTube videos about the same ideas. I’ll give links below. Feel free to leave me questions in the comments or email me at my website’s email address.

Stay safe everyone!

Links and Resources

Website for Visualizing the SIR-Model:

This site uses the variable \beta (aka ‘beta’) in place of a, and \gamma (aka ‘gamma’) in place of b. You can set the initial values of S, I, and R here, as well as the values of the parameters, and the overall time frame. My recommendation for some good visuals:

Susceptible = 1, Recovered = 0, Infected = 0.03, Days = 40, Gamma = 0.1.

Mess around with values of Beta between 0.2 and 2 to watch the curve ‘flatten’! Changing the value of Gamma will give you an idea of the difference between an effective and ineffective hospital system.



(COVID-19 and the SIR-Model)


(Exponential Growth)

(Modeling Viruses)

Source for Image:

One thought on “How Does Disease Spread?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: