The math behind the coronavirus

April 7, 2020

Share this article:

At the moment we live in a big crisis, the corona crisis. Already more than one million people over the world are infected and more than 60,000 people died. A lot of countries are having big outbreaks and the health care is having capacity problems. This is also one the many articles which you probably have seen about the coronavirus. What distinguishes this article from the rest is that it will dive a lot deeper into the mathematical models which researchers use to forecast the spread of the coronavirus.

It is of course extremely important in these times to be able to predict how the corona virus will spread in the coming weeks. Without those predictions, we would have no idea what would be a good strategy to tackle it. Furthermore, it gives an estimate of what will be the pressure on the hospitals such that the medical sector can prepare itself better. So, it is of utmost importance to predict the spread of the coronavirus and good models can potentially save a lot of lives.

The basic reproductive number

Thus, what makes the spread of the coronavirus so quick and so dangerous? An important factor here is the exponential growth of the virus. If we did not take measures against it, it would keep spreading at an exponential pace and things will get out of hand really fast. This exponential growth of the virus follows from another quantity which is used a lot in modelling the spread of viruses. This quantity is called the “basic reproductive number” and it is denoted by R_0. This quantity stands for the average number of people which are infected by one person in a fully susceptible population. A fully susceptible population means that everyone in the population can still get the virus, so there is no one already infected and no one is immune. For the coronavirus, R_0 is estimated to be around 2 or 3. For reference, ebola has an R_0 of 2, SARS has an R_0 of 4 and the seasonal flu has an R_0 of around 1.3.


This basic reproductive number is also what makes the coronavirus an epidemic, since every disease with an R_0 bigger than 1 is called an epidemic. If the R_0 lies around 1, it is called an endemic. In case of an endemic, the growth of the virus is just linear and it is not exponential anymore which makes it a lot easier to control. The number of newly infected persons per day is approximately constant in case of endemic and this is also what is meant with “flattening the curve” in the news. The number of newly infected patients per day should not be growing, otherwise things will get out of hand really fast.

One way to contain an epidemic is to vaccinate people. In this case, we would want the growth of the coronavirus to become linear instead of exponential we want to push the value of R_0 down to 1. This could be done by vaccinating approximately one half of the population. In this way, one person does not infect 2 persons on average but only 1. This is also why all kids in the Netherlands are being vaccinated against the measles. That disease has an R_0 of approximately 12-18 and it could spread extremely fast if nobody would be vaccinated. Unfortunately, there is no vaccine yet for the coronavirus so other measures had to be taken.

The SIR Model

This R_0 provides useful information about a virus but there is of course a lot more to the picture. One of the most basic models used for modelling a virus is the SIR model. This model belongs to the so-called compartmental disease models since it puts the people of a population in three compartments. The first group is called the `susceptible’ group, the people who do not have the virus yet and are not immune. The second group is the `infected’ group which has the virus and the last group is the `recovered’ which consists of people who recovered from the virus. This is also where the name SIR comes from. Also note that there is no separation between people who died and people who recovered, both go to the ‘recovered’ group. The idea of this model is shown in the picture below.

To use some mathematics, we denote these groups as S(t), I(t) and R(t). Note the dependence of t, these groups are dynamic and their size changes over time. The spread of a virus is then modeled as a system of ordinary differential equations given by:

    \begin{align*} \frac{d S(t)}{dt} &= -\frac{\beta I(t) S(t)}{N} \\ \frac{d I(t)}{dt} &= \frac{\beta I(t) S(t)}{N} - \gamma I(t) \\ \frac{d R(t)}{dt} &= \gamma I(t). \end{align*}

So, let’s try to understand what this system tells us. The first equation says that S(t), the number of susceptible persons, decreases at a rate depending on itself, the number of infected persons at the moment and \beta which is the transmission rate. This makes sense of course, if there are a lot of susceptible persons, it is more likely that the total number will decrease quickly. In the same vein, if there are a lot of infected people, the susceptible people will get infected a lot quicker. The value of the transmission rate \beta depends on the disease, in case the disease is transmitted easily, it will have a high value. Furthermore, the rate with which the infected people move to the recovered is given by \gamma I(t). Here, \gamma is a constant which depends on how fast people recover from a particular disease. Note that when we add all equations we get the following:

    \[\frac{d S(t)}{dt} + \frac{d I(t)}{dt} + \frac{d R(t)}{dt} = 0.\]

Then integrating over t gives:

    \[S(t) + I(t) + R(t) = N.\]

where N is a constant which equals the population size. Deriving an analytical solution to this system of equations is a bit complicated since it is non-linear. But what we can do and what is really interesting is use this model for simulation studies. I performed a simulation for the Dutch population which is shown below.

This simulation was performed using the software package \texttt{R} and the time here is given in days. As can be seen, first most of the population is susceptible to the virus but after around two months about 60\% of the Dutch population caught the virus but did also already recover from it. This prediction is of course not the most accurate, since a lot of things are not included in the model like the current lock-down.

The real model used

So, now you are probably asking yourself, what models do the researchers actually use? The foundation of these models is most of the time still the SIR model but with a lot of features added to it. One feature which is also included in their model is the incubation rate, they did this by adding an “Exposed” phase between the susceptible and the infected phase which makes it the SEIR model. Furthermore, another important addition to the model was made by including travel data.

On January 31, the WHO published their SEIR model which was used to model the spread of the coronavirus in Wuhan. This model is a lot more complicated, for example the equation for the number of susceptible persons in the population is given by:

\frac{d S(t)}{d t} = - \frac{S(t)}{N}\left (\frac{R_0}{D_I} I(t) + z(t) \right) + L_{I,W} + L_{C,W}(t) - \left(\frac{L_{W,J}}{N}+ \frac{L_{W,C}(t)}{N}\right )S(t).

In this equation, we again see the basic reproductive number R_0, D_I stands for mean infectious period, z(t) stands for the force of infection in baseline scenario and the L variables all have to with travel data. It is out of the scope of this article to dive deep into this model, but hopefully it gives you a taste of how the WHO is using mathematics to forecast the spread of the coronavirus.

So, as we have seen, mathematics can help us a lot to get more insights how the coronavirus is spreading and in this way it can even help with saving lives. Hopefully you enjoyed the article and maybe it even motivates you a bit to open your mathematics books again in these quarantine times.

Dit artikel is geschreven door Stan Koobs

Read more

Why your Dobble cards always match

Why your Dobble cards always match

Dobble: a game played by kids, but still very popular among adults. In the game, you have to draw two random cards and place them face-up on the table between all the players. Then, you have to look for the identical symbol between the two cards. Between every two...

Gabriel’s Horn Paradox

Gabriel’s Horn Paradox

Some people just die too soon. One such person was Evangelista Torricelli, an Italian mathematician who died at the age of 39 in the year of 1647. Had Torricelli lived longer, he just might have discovered calculus, before Sir Isaac Newton and Gottfried Leibniz....

Why do we count in base 10?

Why do we count in base 10?

What is two plus two? The realist will say four, the computer will say 100, and the cynic will say 5 – but which is correct? The reason we count in base 10 stems from the simplest fact: humans have 10 fingers. Understandable and logical, as this seems to be nature’s...