A Mathematical Model of Epidemics—A Tutorial for Students

: This is a tutorial for the mathematical model of the spread of epidemic diseases. Beginning with the basic mathematics, we introduce the susceptible-infected-recovered (SIR) model. Subsequently, we present the numerical and exact analytical solutions of the SIR model. The analytical solution is emphasized. Additionally, we treat the generalization of the SIR model including births and natural deaths.


Introduction
The coronavirus disease 2019 (COVID- 19) is an infectious disease that emerged in Wuhan, China in December 2019 and has rapidly spread in China and worldwide.On 30 January 2020, the World Health Organization (WHO) declared the outbreak as a public health emergency of international concern.Furthermore, the WHO announced the COVID-19 outbreak as a "pandemic" on 11 March 2020 [1,2].The prevention of infection is the responsibility of every human being.
A few mathematical models have been developed to describe the dynamics of the evolution of COVID-19 [3][4][5][6][7].Among them, the susceptible-infected-recovered (SIR) model is treated as a basic mathematical model for the spread of epidemic diseases.This model was proposed in 1927 by Kermack and McKendrick [8,9], and has been used as a basic model for epidemics [10][11][12][13][14].
The SIR model is not difficult from the viewpoint of mathematics, and can be understood even by high-school students.We present a numerical solution and an exact analytical solution [15] of the SIR model herein.The analytical solution is emphasized because it is not well known.We provide a more straightforward derivation of the analytical solution compared with that in the original paper; it will serve as a good opportunity for students to learn differentiation and integration.Furthermore, we treat the generalization of the SIR model with vital dynamics including birth and natural death processes.We demonstrate that the complex SIR model can be reduced to an Abel-type differential equation.

Exponential Growth
The spread of an epidemic disease depends on both the amount of contact between individuals and the possibility of an infected person transmitting the disease to another person.If each infected person is in contact with two other people before they recover, the disease will soon begin to spread rapidly.Assuming that recovery requires one day, the number of sick people will double each day.This situation is illustrated schematically in Figure 1.A numerical series for which the ratio of each two consecutive terms is a constant r (in this case, r = 2) is called a geometric series.The n-th term is expressed as follows: where the initial value is set to be A(0).If r is greater than 1, then the series increases and approaches positive infinity rapidly (we assume A(0) > 0); if r is smaller than 1, the series decreases and approaches zero.At the border of r = 1, the series remains constant.As an example, when one borrows money, the interest is typically a compound interest, i.e., a geometric series.When the annual interest is 30%, the return in five years will increase rapidly as follows: Extending n to a real number, we introduce the exponential function The differentiation of the exponential function can be calculated as follows: The special constant r that satisfies the limiting equation is called Napier's constant e.With this definition of e, the differentiation of the exponential function becomes Hereafter, we treat the form of the exponential function as The exponential function grows for a > 0, and decays for a < 0. When a = 0, the function becomes constant.We have assumed that C > 0.
The inverse function of x, t = t(x), is a logarithmic function written as where r is the base.The logarithmic function with the base e is called a natural logarithmic function and is denoted as follows: log e x = ln x.
The exponential function always explodes for a > 0. To verify whether it grows rapidly or slowly, we may consider the doubling time.From the condition we obtain that the doubling time is a time difference that satisfies It is instructive to use graph plots to observe the growth behavior.Both the linear and semi-log plots are shown in Figure 2. A semi-log plot has one axis (t) on a linear scale, and the other on a logarithmic scale.When the function grows exponentially, a linear increase is observed in a semi-log plot.
Next, we explain the differential equation.Let us consider the case in which the increase in a variable x(t) is proportional to x(t), i.e., The difference in the left-hand side becomes larger when the time difference ∆t becomes longer; hence, ∆t is multiplied to the right-hand side.If we divide both sides by ∆t and take the limit of ∆t → 0, the following differentiation definition is obtained: It becomes a dx(t)/dt equation, which is known as a differential equation.Because Equation ( 7) satisfies the differential equation, it is regarded as a general solution to the differential equation shown in Equation (13).Here, C is an arbitrary constant and is known as an integral constant.Next, we discuss the mathematical model of the spread of epidemic diseases.

Mathematical Model of Epidemics
If exponential growth continues, the number of the infected individuals increases.We may consider that the infected individuals recover and develop immunity.Subsequently, the number of recovered individuals increases, but we hope that the spread of the disease will subside.We may consider a closed society of N individuals and classify individuals into three types: susceptible (S), infected (I), and recovered (R).The SIR model [8,9] is a mathematical model for the spread of epidemic diseases that describes the changes in the numbers of the three types of individuals.
We denote the number of susceptible (S) individuals by x(t), the number of infected (I) individuals by y(t), and the number of recovered or removed (R) individuals by z(t).The time t is measured in days, t (day).We may consider that each person is in contact with m persons per day on average.Because the proportion of infected individuals is y(t)/N, the number of contacts between the infected and susceptible persons becomes m(y(t)/N)x(t) per day.If we set the probability of infection for each contact as p, the number of the newly infected individuals within ∆t days becomes in total.If we set β = mp, the number of non-infected individuals (S) from the t-th day through the (t + ∆t)-th day changes as For the limit of ∆t → 0, the derivative can be obtained as follows: which is the same procedure as that derived for Equation (13).
Meanwhile, the infected individuals recover at a removal rate of γ per day.Subsequently, the increase in the number of the recovered persons becomes The inverse of γ is the expected duration of infection.Here, the number of the recovered persons includes the number of the deceased persons because it refers to the number of individuals who cannot possibly infect others.
Because the total number of individuals is set to be constant such that we express y(t) as follows: The change in the number of the infected individuals can be written as Gathering the equations, we have These are simultaneous equations, where β is the rate of infection and γ the rate of recovery (the rate of quarantine).
If we introduce the transformations of variables, such as x = x/N, ỹ = y/N, z = z/N, and t = βt, then Equations ( 21)-( 23) can be rewritten as The equations are in non-dimensional forms.The use of x, ỹ, and z implies that we only need to consider the fraction of the number of individuals.A single parameter representing the ratio of β and γ appears as follows: The number R 0 is known as the basic reproduction number [16][17][18], and the number of infected individuals increases when R 0 > 1 (precisely, in the limit of x(0) → 1), whereas it decreases when R 0 < 1.

Numerical Solution of SIR Model
The SIR equations (Equations ( 21)-( 23)) were derived; subsequently, the differential equations can be solved numerically.The simplest method to solve the differential equation is as follows: It is noteworthy that Therefore, we may calculate with a time step size of ∆t.This first-order numerical procedure is known as the Euler method.
However, it has a convergence problem; therefore, the errors will be larger.Although many methods have been proposed to reduce errors, we employ the second-order Runge-Kutta method.The python program is given in Appendix A. As initial conditions, we use with N = 10,000, N 1 = 9900, N 2 = 10, and N 3 = 0.The calculated results using parameters β = 0.5 and γ = 0.2 are shown in Figure 3a.As shown, the number of infected individuals (I) increases with time, reaches a peak, and gradually decreases.Meanwhile, the number of the recovered individuals (R) increases gradually.Thus far, we have discussed the number of the infected individuals, y(t).The number of newly infected individuals is reported daily.How can we express the number of the newly infected individuals in the framework of the SIR model?The newly infected individuals are added to y(t), but the recovered individuals z(t) are subtracted from y(t) as well.Therefore, the number of the newly infected individuals, y new (t), is expressed as with ∆t = 1.Because x(t) + y(t) + z(t) = N, we can express Based on Equation ( 21), we obtain Because y new (t) is proportional to y(t), the behavior of the temporal variation of y new (t) is similar to that of y(t).In Figure 3a, y new (t) is plotted as a dotted red curve.
In Figure 3b, the temporal variations of the numbers of individuals for the three types (S, I, R) are compared for β = 0.5, 0.45, 0.4, 0.35, and 0.3.Here, γ is fixed as 0.2.The delay of the epidemic and the reduction in the peak height (number of the infected individuals) were shown as β decreases.This trend is typically associated with the effect of social distancing.
The basic reproduction number R 0 is frequently used as the "R naught" or "R zero".Sometimes, it is treated as a time-varying variable and used as a measure for easing lockdowns.In the SIR model, R 0 is a constant and defined as shown in Equation (27).If we formally define a time-varying number as in the framework of the SIR model, it judges whether y(t) increases (R 0 (t) > 1) or decreases (R 0 (t) < 1) in Equation (22).In Figure 3a, this time-varying R 0 (t) is also plotted as a dotted black curve.The initial value R 0 was set to be 2.5; as shown, R 0 (t) gradually decreases, becomes 1 at the point where y(t) reaches a peak, and further decreases below 1.The terminology "effective reproduction number" is sometimes used, but it has several meanings.Finally, we remark on the cumulative number of the infected individuals, which is reported daily.In this model, the cumulative number is expressed as y(t) + z(t) because it includes the number of the recovered individuals.In the initial stage of the outbreak of an epidemic, this number increases exponentially, and it is appropriate to use the semi-log plot as shown in Figure 2b.The initial increase in the cumulative number of the infected individuals, y(t) + z(t), is plotted in a semi-log plot for several β in Figure 4, where γ is fixed.As shown, the initial increase is linear, and the slope decreases as β decreases.

Analytical Solution of SIR Model
We have demonstrated the numerical solution of the SIR model.Although the SIR model was proposed in 1927 [8,9], the analytical solution of the differential equations was only presented in 2014 [15].In the analytical solution, only exponential and logarithmic functions are involved.Therefore, high-school students will be able to understand the analytical solution of the SIR model.
Next, we present the main stream of the derivation of the analytical solution.Some detailed calculations are given as questions, and the solutions can be found in Appendix B. Equations ( 21)-( 23) are simultaneous differential equations; we can rewrite them as the differential equation of a single function.The procedure is the same as that used to solve a linear simultaneous equation.
We employ a more direct method of deriving the analytical solution compared with that in the original paper [15].First, we rewrite Equation ( 21) as where β = β/N.If we differentiate both sides, we obtain Inserting Equation (37) into Equation ( 21) yields Comparing Equation (38) and Equation (39), we have Therefore, we have obtained a single differential equation of x(t).
Next we introduce the following function: Therefore, we have the following equation for φ: An ordinary differential equation of this form is known as the Bernoulli differential equation.The general solution is obtained as follows: where C 1 is an integral constant.
Problem 1. Show that the general solution of the Bernoulli differential Equation ( 42) is given by Equation ( 43).
From the relation of the inverse function of Equation ( 41), we have Using Equation ( 21), we obtain Moreover, from Equations ( 21) and ( 23), we have Subsequently, the relation between x(t) and z(t) becomes where C 2 is an integral constant.If x(0) = N 1 and z(0) = 0 for t = 0, then C 2 becomes N 1 .Therefore, we have From the relation x(0) + y(0) + z(0) = N, we obtain the integral constant C 1 as If we substitute C 1 into Equation ( 43), we have By integrating this equation, t is obtained as a function of x as follows: Problem 2. Confirm that Equation (51) satisfies differential Equations ( 21)- (23).
For the convenience of calculation, we change a variable such that ξ/N 1 → ξ.Subsequently, we rewrite it as It is noteworthy that β appears here instead of β.
Because we have obtained the form of the integral of one variable, we then calculate t(x).We can perform a numerical integration by summing up an integrand with a sufficiently small step size, as follows: It is noteworthy that the direction of the integral is negative because x(t)/N 1 < 1 in Equation ( 52), and the integrand has a negative value.If t(x) is obtained, then y and z can be calculated using Equations ( 45) and (48), respectively.The calculated results of x(t), y(t), and z(t) are shown in Figure 5, where the same conditions are used as in Figure 3a.As expected, the analytical solution and the numerical solution obtained using the Runge-Kutta method are the same.Next, we analyze the property of the differential equation.We can investigate the behavior of t → ∞.The variable ξ in the limit of t → ∞, x(∞)/N 1 , is obtained when the denominator of the integrand of Equation (52) becomes zero.If we write the solution to the equation To solve an equation such as Equation (54) numerically, an iterative method such as Newton's method is used.However, for the case of a single variable, it is sufficient to plot the left-hand side of Equation ( 54) and find an intersection with a zero.Figure 3b shows a comparison of the behaviors of the temporal evolutions of x(t), y(t), and z(t) for β = 0.5, 0.45, 0.4, 0.35, and 0.3, with γ = 0.2 (N = 10,000, N 1 = 9900).Figure 6 shows the solutions of Equation (54) under the same conditions.The solutions reproduce the limit values of x(t) for t → ∞ in Figure 3b.Using R 0 defined in Equation ( 27), we can rewrite Equation (54) as in the limit of N 1 /N → 1.This relation is known as the solution of the final size equation [19].The final number of the infected individuals, (z(∞) = N − x(∞)), is a variable of interest.The β dependence of this number is tabulated in Table 1.As shown, the final number of the infected individuals decreases with the infection rate β.Furthermore, the table presents the peak values of the infected individuals.
Table 1.The final number of the infected individuals: N = 10,000, N 1 = 9900, N 2 = 10, N 3 = 0, γ is fixed as 0.2.The peak values of the infected individuals are also shown.

The Final Number The Infected (y(t))
The Newly Infected (y new (t))

Analytical Approach of SIR Model with Vital Dynamics
The standard SIR model has been extended to include some other factors.One example is a model that considers birth and natural death processes [20,21].The nonlinear equations are as follows: where µ represents the natural death rate, which is assumed to be equal to the birth rate.Harko et al. [15] studied this model and demonstrated that these simultaneous equations can be reduced to an Abel-type equation.
Next, we show a more straightforward derivation than that presented in the original paper [15].Similar to the procedure used for the standard SIR model, we first eliminate a variable y(t) from Equations ( 56) and (57); where Next we introduce a function Subsequently, we obtain an equation for φ as follows: By introducing a function we obtain the equation for v as follows: An ordinary differential equation of this form is known as the Abel-type first-order differential equation of the first kind.The mathematical properties of the Abel-type equations have been investigated intensively.One may use an iterative solution, as demonstrated by Harko et al. [15].If we set µ = 0, this model becomes the standard SIR model; Equation (63) is reduced to Equation (A4) in the case of µ = 0. Furthermore, some recent developments regarding the solution of the Abel equation of the first kind have been reported [22,23].
Next, we briefly analyze the properties of the SIR model with vital dynamics.We may solve Equations ( 56) and (57) simultaneously or solve an equation comprising one variable, i.e., Equation (63).For the analytical approach, it is beneficial to introduce the following function: Subsequently, we can rewrite Equation (63) as or This type of equation is known as the Abel equation of the second kind.We plot x(t), y(t), and z(t) of the SIR model with vital dynamics by solving Equations (56) and (57) simultaneously or solving Equation (65), in Figure 7.Both approaches yielded the same results, as expected.The parameters µ varied as 0.0, 0.02, 0.04, 0.06, and 0.08, whereas β and γ were fixed as 0.5 and 0.2, respectively.In the plot, the data of µ = 0.0 are shown by thick dashed curves.As shown, a minimum occurred in x(t), and y(t) remained non-zero for t → ∞.This represents a disease in the endemic steady state.New births can provide more susceptible individuals, thereby resulting in a sustained epidemic.

Discussion and Conclusions
We explained the mathematical model of the spread of epidemic diseases.It is noteworthy that in the SIR model, a closed society was assumed, and all the individuals have the same contact possibility.Hence, the obtained results are valid only within such a model.
Next, we consider the meanings of the infected (I) and the recovered (R).In this model, recovered (R) refers to individuals who do not infect others.Hence, the individuals who have been quarantined are regarded as recovered individuals.Therefore, γ is called the rate of recovery or the rate of quarantine.To reduce R 0 , we may reduce β because R 0 is expressed by β/γ.This can be realized by social distancing (physical distancing).Because the inverse of the denominator, γ, is the average time of infection, shortening the time of confirming the infection by testing provides the same effect as social distancing.In Equations ( 21)- (23), β is multiplied by x(t)y(t), whereas γ is only multiplied by y(t).
Next, we discuss the reproduction number.The Robert Koch Institute (RKI) of Germany publicizes the reproduction number R daily.The point estimate of R for a specific day is estimated as the quotient of the number of incident cases on that day divided by the number of incident cases four days earlier.The incident cases are not directly extracted from the notification system, but they are estimated using statistical analysis to predict the true onset of illness.To reduce random effects, a moving 4-day or 7-day average is used.The reproduction number R of the RKI is clearly defined and reflects the measure of disease spread; i.e., R is either greater or smaller than 1.However, it is noteworthy that the absolute value of such a time-varying reproduction number depends on the definition.
As for the mathematical models of epidemics, some extended models have been developed, such as the susceptible-infectious-recovered-deceased (SIRD) model [24][25][26].This model differentiates between recovered (i.e., individuals who have survived the disease and are now immune) and deceased individuals.Another example is the susceptible-exposed-infectious-recovered (SEIR) model [27][28][29][30], where an "exposed" category is added.It considers the situation involving a significant incubation period, during which individuals have been infected but are not yet infectious.
We learned not only the numerical solution of the SIR model, but also the exact analytical solution of the model.Furthermore, we treated the generalized SIR model with vital dynamics.Students can learn differentiation and integration using this tutorial.Furthermore, it will be beneficial if they indicate interest in social problems.

Figure 3 .
Figure 3.The numerical solution of the SIR model using the Runge-Kutta method; N = 10,000, N 1 = 9900, N 2 = 10, N 3 = 0. (a) Plot of the temporal variation of the susceptible (S), the infected (I), and the recovered (R) for β = 0.5 and γ = 0.2.In addition, the number of the newly infected individuals is shown by the red dotted curve.The time-dependent R 0 (t) is also shown in the black dashed curve.(b) The comparison of S, I, R for β=0.5, 0.45, 0.4, 0.35, and 0.3; γ is fixed as 0.2.

3 Figure 4 .
Figure 4.The semi-log plot of the cumulative number of the infected, (y(t) + z(t)).

Figure 5 .
Figure 5.The analytical solution of the SIR model.The conditions are the same as in Figure 3a.

Figure 7 .
Figure 7.The solution of the SIR model with vital dynamics.The parameter µ is varied as 0.0, 0.02, 0.04, 0.06, and 0.08, and the data of µ = 0.0 are shown by thick dashed curves.