An Epidemic Model with Time Delay Determined by the Disease Duration

: Immuno-epidemiological models with distributed recovery and death rates can describe the epidemic progression more precisely than conventional compartmental models. However, the required immunological data to estimate the distributed recovery and death rates are not easily available. An epidemic model with time delay is derived from the previously developed model with distributed recovery and death rates, which does not require precise immunological data. The resulting generic model describes epidemic progression using two parameters, disease transmission rate and disease duration. The disease duration is incorporated as a delay parameter. Various epidemic characteristics of the delay model, namely the basic reproduction number, the maximal number of infected, and the ﬁnal size of the epidemic are derived. The estimation of disease duration is studied with the help of real data for COVID-19. The delay model gives a good approximation of the COVID-19 data and of the more detailed model with distributed parameters.


Introduction
Mathematical modeling of infectious diseases attracts much attention due to successive epidemics, such as HIV, emerging in the 1980s and still continuing [1,2], SARS epidemic in 2002-2003 [3,4], H5N1 influenza in 2005 [5,6] and H1N1 in 2009 [7,8], Ebola in 2014 [9,10]. The ongoing COVID-19 pandemic has stimulated unprecedented efforts of mathematical modeling in epidemiology. A wide variety of mathematical approaches are developed to study epidemiological problems. However, sufficiently simple and validated models still remain in the focus of mathematical modeling in epidemiology.
Modern studies in mathematical epidemiology begin with the SIR model developed in the works by W. O. Kermack and A. G. McKendrick [11][12][13], stimulated by the Spanish flu epidemic in 1918-1919. Among many developments of such models, we can cite multi-compartment models [14][15][16], models with a time-varying or nonlinear disease transmission rate [17,18], multi-patch models [19][20][21], multi-group models incorporating the effect of the heterogeneity of the population [22], and epidemic models with vaccination and other control measures [23,24]. Random movement of individuals in the population is considered in spatiotemporal models in order to describe spatial distributions of susceptible and infected individuals [25,26]. A more detailed literature review can be found in the monographs [27,28] and review articles [29,30]. The conventional SIR model, which includes susceptible (S), infected (I), and recovered (R) compartments, and similar models assume that recovery and death rates at time t are proportional to the number of actively infected individuals I(t) at the same moment of time. This assumption does not take into account disease duration, and it can lead to a large error. In our previous work [31], we showed that this assumption leads to an overestimation of actual recoveries and deaths. Instead, if we use distributed recovery and death rates, properly chosen from real data, the description of the epidemic progression becomes more precise. However, since distributed recovery and death rates are not easily available, we develop a simpler delay model in this work. It gives close results, but it does not require precise immunological data. The model considered in [31] involves distributed recovery and death rates. The model considered in [32] is an extension of the model by incorporating the vaccinated compartment, and the resulting model is an immuno-epidemic model. The delay model, with disease duration delay, considered here is derived from the model proposed in [31] with an appropriate assumption on the recovery and death rate functions. The present model is quite different from the delay model considered in [33] as the delay parameter involved with the earlier model was the measure of the incubation period and the departure of infected individuals from the infected compartment was due to the imposition of quarantine measure. The model proposed and analyzed here is solely dependent on the disease duration period and without imposed quarantine.
Most of the existing delay epidemic models consider time delay either in the disease incidence function or in the susceptible recruitment function (Appendix A). The delay in the recovery and death rates has not been studied yet thoroughly. In this work, we introduce time delay in recovery and death rates with the average disease duration considered as the delay parameter.
In Section 2, we discuss the distributed model and derive the delay model where the discrete time delay estimates average disease duration. We obtain epidemic characterization of the delay model in Section 3. Then, in Section 4, we perform a numerical comparison among the distributed model, delay model, and conventional SIR model with the equivalent parameter values. Next, we discuss a method to estimate the value of disease duration using the real data of disease incidence in Section 5. In Section 6, we validate our delay model with epidemiological data collected during the COVID-19 epidemic. The main outcomes of the proposed model and its epidemiological implications are discussed in the concluding section.

Model Formulation
In contrast with the existing compartmental epidemiological models, we start the model derivation by the introduction of the class of newly infected individuals instead of the total number of infected individuals. This approach is appropriate to evaluate daily recovery and death rates. We recall in this section the model with a distributed recovery and death rate [31,33]. We then use this model to derive the delay model and study its properties in the next sections.

Model with Distributed Parameters
The number of newly infected individuals J(t) is determined by the rate of decrease of the number of susceptible individuals, J(t) = −S (t). Assuming that is constant, where I(t) is the total number of infected at time t and R(t) and D(t) denote, respectively, recovered and dead, we can write Following conventional epidemiological models, we set where β is the disease transmission rate. Let r(η) and d(η) be the recovery and death rates depending on the time-since-infection η. Then, the number of infected individuals who will recover at time t is given by the expression: and the number of infected individuals who will die at time t: Differentiating Equality (1), we obtain Thus, we obtain the following integro-differential equation model: with the initial condition S(0) = N, I(0) = I 0 > 0 (I 0 is sufficiently small as compared to N), R(0) = 0, D(0) = 0.

Reduction to SIR Model
In a particular case, if we assume the uniform distribution of recovery and death rates: where τ > 0 is disease duration, r 0 and d 0 are some constants, and if r 0 and d 0 are small enough, then the model (3) can be reduced to the conventional SIR model (see [31]): However, the approximation of a uniform distribution of recovery and death rates may not be precise since infected individuals are unlikely to recover or die at the beginning of the disease. In the following subsection, we consider another choice of recovery and death rates, which will approximate the real scenario of recovery and death more accurately.

Delay Model
Let us assume that disease duration is τ, and the individuals J(t − τ) infected at time t − τ recover or die at time t with certain probabilities. This assumption corresponds to the following choice of the functions r(t − η) and d(t − η): where r 0 , d 0 are constants, r 0 + d 0 = 1, and δ is the Dirac delta-function. Then, Note that J(t) is the number of newly infected individuals appearing at time t. If we assume that the first infected case was reported at time t = 0, then we can set J(t) = 0 for all t < 0. Now, integrating the above two equations from 0 to t and assuming that R(0) = D(0) = 0, we obtain Then, instead of (2), we have and from (3b), From (5) we obtain Hence, Once we obtain the solution S(t) from (8), then we can find I(t) using the following relation: Hence, System (3) is reduced to the following delay model: with J(t) = 0 for all t < 0. A similar model was proposed in [33] without derivation from the distributed model.

Epidemic Characteristics
In this section, we determine the basic reproduction number, the final size of the epidemic, and maximum number of infected individuals for the delay model (9).

Basic Reproduction Number
To determine the initial exponential growth rate and the basic reproduction number R 0 , we use the equations for the infected compartments I(t) considered in the form: Suppose that, at the beginning of the epidemic, S(t) ≈ S 0 and S(t − τ) = S 0 . Then, from (10), we have Substituting Clearly, (11) has a solution λ = 0 and a non-zero solution with the sign determined by G (0). Denote We observe that G(λ) is an increasing function of λ and G(λ) → β S 0 N as λ → ∞. This implies that Equation (11) has a positive solution, i.e., there exists λ * > 0 such that G(λ * ) = λ * . If R 0 < 1, i.e., G (0) < 1, then for all λ ≥ 0. Therefore, the equation G(λ) = λ has no positive solution. Hence, the basic reproduction number is given by the following expression: (11) in the following form: The solution of this equation determines the initial exponential growth rate, which depends on the single parameter R 0 .

Final Size of the Epidemic
Next, we determine the final size of the susceptible compartment S f = lim t→∞ S(t). From (3a), we obtain: Then, integrating from 0 to ∞, we obtain Changing the order of integration, we get Now, integrating (9a) from 0 to ∞, we have: Since Thus, the final size can be obtained from the equation: Integrating (9c) and (9d) from 0 to ∞ and using (13), we obtain: Final size S f obtained from the numerical simulation and using the Formula (15) is verified for three different values of τ and a range of values of β ( Figure 1). Observe that Equation (15) can be written as: Note that the SIR model (4) gives the same equation for the final size, and also in the SIR model (4), if we assume that r 0 + d 0 = 1/τ, then the expression of R 0 is equivalent for both the SIR model (4) and the delay model (9).

Maximum Number of Infected Individuals
We now derive an approximate formula for the maximal number of infected individuals for the delay model (9). We have Substituting the relation (17), we obtain Integrating (19) from 0 to t m and changing the variable inside the first integral of the right-hand side, we obtain: We assume that S(t) = S 0 for all t ∈ [−τ, 0] and use the approximation ( Figure 2): Then from (20) we have Again, integrating (19) from t m − τ to t m , we obtain: Changing the variable inside the first integral of the right-hand side, we get: Now, using the approximation described in Figure 2, we conclude that Using the relation I(t m ) = S(t m − τ) − S(t m ), after some transformations, (22) can be written as: Using (18), we obtain: Solving (21) and (23), we can find x, y and, consequently, I m , S m . In Figure 3, we show a comparison between the maximum number of infected obtained by Equations (21) and (23) and the maximum number of infected obtained by direct numerical simulation of the delay model (9). From Figure 3, we can observe that the approximation gives a very close upper bound to the maximum number of infected.

Comparison of Models (3) and (9) and SIR (4)
We compare the results obtained from the distributed model (3), the delay model (9), and the conventional SIR model (4). In [31], we showed that Model (3) can describe the epidemic spread more precisely, and it can exactly capture the recovery and death dynamics by using suitable distributed recovery and death rates. However, the main constraint in using the distributed model (3) is the availability of distributed recovery and death rates. Instead, the average duration of the disease is easier to determine. We show that, in the delay model (9), if we take τ as the mean of the recovery and death distribution functions involved in the distributed model (3), then the delay model (9) gives a close solution to the distributed model (3).
To estimate the recovery and death distributions in the model (3), we use the function fitdist(:,gamma) in MATLAB. This function is used to fit a vector of data X = (x 1 , x 2 , · · · , x n ) by a gamma distribution of the form 1 b a Γ(a) x a−1 e −x/b , where a and b are the shape and scale parameters. This function gives the maximum likelihood estimators of a and b for the gamma distribution, which are the solutions of the simultaneous equations whereX is the sample mean of the data X and Ψ is the digamma function given by The function fitdist(:,gamma) estimates the shape and scale parameters with the 95% confidence interval. Using this statistical technique, we estimated the recovery and death distributions for the model (3) for the data used in [31] and given by the formulas Figure A1 in Appendix C), where the estimated parameter values are given in Table 1.
Here, p 0 is the survival probability, and its estimated value, from the data, is p 0 = 0.97 [34]. The estimated average time to recovery is 17.85 days and to death is 13.19 days. The value of survival probability p 0 is 0.97 [34], that is, out of 100 infected individuals, 97 infected will recover. This estimate matches with most of the COVID-19 epidemic data from various countries [34,35]. Thus, we can take average disease duration τ as 17.7 days. The corresponding value in the SIR model is r 0 It is essential to mention here that, instead of a gamma distribution, we can use the Erlang distribution E k = λ k x k−1 e −λx (k−1)! , x, λ ≥ 0 and k ∈ N. Since, for lower values of k, e.g., E 1 , E 2 distributions, with a reasonable choice of λ, give a significant probability of recovery or death at the beginning of infection, we should look for an Erlang distribution with higher values of k.
We found that the Erlang-8 distribution with λ = 0.4545, and the Erlang-6 distribution with λ = 0.4548, closely match with the gamma distributions for recovery and death rates respectively, as estimated above. The estimated values of the shape parameters a 1 and a 2 associated with the gamma distributions can be approximated as a 1 ≈ 8 and a 2 ≈ 6 for the recovery and death rate distributions, respectively. Thus, one can use the E 8 and E 6 distributions instead of the gamma distributions for recovery and death rates. However, the result will be similar in both cases, and we continue our numerical simulation with the gamma distributions. Though the parameters of the three models correspond to each other, the distributed model (3) and the delay model (9) both give far different dynamics as compared to the SIR model (4), whereas the distributed model (3) and the delay model (9) give reasonably close dynamics to each other (Figure 4).  (3) with the gamma distribution, the delay model (9), and the SIR model (4) is shown in Figure 5 for different values of the transmission rate β. As before, the maximal numbers of infected individuals I m in the models (3) and (9) are sufficiently close, but they are much higher than for the SIR model (4). The times to maximum infected t m in the models (3) and (9) are reasonably close, but less than for the SIR model (4). The final size of the epidemic S f is more or less the same for all three models. Similar properties are observed if the gamma distribution is replaced by the Erlang distribution with the corresponding parameters.
This result indicates the relevance of the delay model. If we do not have sufficient individual-level data to estimate the recovery and death distributions, but we have an approximate value of disease duration, then we can describe the epidemic progression in a sufficiently precise way.

Determination of Disease Duration from Data
In this section, we determine the disease duration τ from the epidemiological data for the daily number of infected J(t) and the total number of infected individuals I(t), using the equation Let I(t) have the maximum at t = t m . Set I(t m ) = I m . Then, J(t m ) = J(t m − τ), i.e., the daily number of infected is the same at two different time points t = t m and t = t m − τ. From the real data of the infected individuals I(t), we can find the day on which the daily number of active cases is maximal, and it determines t m . From the data of daily reported cases J(t), we can observe that J(t) crosses its maximum at some time before t m . Now, we have to find the value of J(t) such that J(t m ) will be equal to J(t m − τ), which in turn determines the disease duration τ. Hence, considering the delay model, using the real data of daily new cases J(t) and active cases I(t) around a peak, we can find the disease duration τ.
We illustrate this method using the data of J(t) and I(t) taken from [35] for COVID-19 in Italy. We collected the daily new reported data J(t) and active case data I(t) for Italy from 21 February 2020 to 31 May 2021 (which capture the first three peaks in Italy) and from 10 November 2021 to 28 February 2022 (which capture the peak due to Omicron in Italy). To have smoother data, we used the 7-day moving average, the data on the j-th day replaced by the average data from the (j − 3)th day to the (j + 3)th day. As the concerned method is focused on the peaks, the error at the beginning and end of the time interval is not essential. In Italy, during the first peak (in April 2020), the peak of I(t) is attained at t m = 51 (Figure 6a) and the peak of J(t) is attained before on t = 51, which is less than t m (Figure 6b). First, we find that J(t m = 51) = 4.17 × 10 3 and then find J(32) = 4.15 × 10 3 ≈ J(t m = 51). This implies J(t m − τ) = J (32), and consequently, we can calculate τ = 19 as the disease duration during the first peak. Similarly, during the second peak (in November 2020) and third peak (in March 2021), we estimated the disease duration as τ = 20 days and τ = 14 days, respectively. The peak of epidemic progression due to Omicron in January 2022 is shown in Figure 6c,d, and we estimated the value of τ as 11 days. Similarly, the value of τ is estimated for some other countries ( Table 2).

Remark 1.
It is important to note that in the case of the combination of strains, the estimated value of τ corresponds to the weighted average of the strain-specific disease duration. Furthermore, note that this method depends on how the daily cases of infection were reported. However, in many cases, there is a dominant variant during epidemic outbreaks. Thus, the estimated value of τ can be considered as the disease duration corresponding to the dominant variant during a specific epidemic wave.

Remark 2.
Note that the delay model (9) does not take into account the difference in duration to recovery and the duration to death. If we assume that the death cases are relatively rare (e.g., approximately ≤ 2% in the case of COVID-19), then this difference may not be very essential.

Remark 3.
Note that this method of the estimation of τ remains the same for time-varying β = β(t). Thus, the method discussed above does not depend on whether β is a constant or β varies with respect to time. Since the time-varying β(t) is capable of incorporating the effect of all possible infected compartments (i.e., exposed E(t), asymptomatic I A (t), symptomatic I S (t), hospitalized H(t), etc.) on disease transmission, we can use this method to obtain an estimation of τ for any infectious disease with available data.

Model Validation with Epidemiological Data
In order to validate the delay model (9) and the method of estimation of τ, we compared the results with the epidemiological data. We used the estimated value of τ obtained by the method described above during different peaks of COVID-19.
In Italy, the estimated value of τ is 19 days, 20 days, 14 days, and 11 days during the first peak (April 2020), the second peak (November 2020), the third peak (March 2021), and the fourth peak (January 2022), respectively. We assumed that the disease duration τ remains τ = 19 days from 21 February 2020 to 15 August 2020; τ = 20 days from 16 August 2020 to 19 February 2021; τ = 14 days from 20 February 2021 to 21 May 2021; and 11 days from 10 November 2021 to 28 February 2022 (during Omicron in Italy).
Once these parameter values were determined, we took the number J(t) of daily infected individuals from the epidemiological data [35] and found the sum of daily recoveries and deaths from the expression These results were compared with the sum of recoveries and deaths in the data. Figure 7 shows the result of such a comparison for Italy from 21 February 2020 to 21 May 2021 and from 10 November 2021 to 28 February 2022, with the data from [35] (7-day moving average). Recoveries and deaths can also be determined as a proportion of active cases σ(t) = (r 0 + d 0 )I(t), as is done in the SIR model. Here, I(t) is taken from the data and r 0 + d 0 = 1/τ, and we observed that the SIR model overestimates the sum of recovered and dead. Thus, the delay model (9) gives a good description of the recovery and death compared with the epidemiological data, while the SIR model overestimates them.

Discussion
We proposed a delay model under the assumption that the infected individuals recover or die exactly after an average disease duration τ. Generally, in the case of COVID-19, we observed that the recovery and death distributions follow unimodal or bimodal gamma distributions [34,36]. Estimating these gamma distributions requires individuallevel data with the date of onset of disease and the date of recovery or death, which may be very difficult to gather for every country or province. Instead, the only information about the disease duration can help us to obtain sufficiently good results using the delay model. Furthermore, we developed a method to estimate the disease duration from the epidemiological data. It is important to mention here that one can use the Erlang distributions, instead of the gamma distributions, for a compartmental epidemic model with multi-phase disease transition [37].
Let us note that we consider only symptomatic individuals in the model. The influence of asymptomatic individuals is widely discussed in the COVID-19 literature. According to some estimates, they can constitute between 25% and 50% of the total number of cases [38,39]. On the other hand, the infectivity of asymptomatic individuals is much lower than the infectivity of symptomatic individuals because infectivity is proportional to the viral load in the upper respiratory tract [40] and symptoms correlate with viral load. Hence, in the first approximation, we can consider only symptomatic individuals. Further studies are needed to take into account asymptomatic individuals more precisely.
We noticed that during different peaks of COVID-19, the estimated value of τ was different. This difference might be due to different strains or the change of proportion of different infected compartments (such as asymptomatic, hospitalized) that we counted in the same compartment.
The presence of exposed individuals can be taken into account by means of timedependent infectivity rate β(t − η). For the individuals infected at time η, their infectivity at time t depends on the difference t − η. The function β(t − η) is small if the difference t − η is small, which is the case of exposed individuals. This case was studied in [32]. We did not consider the time-dependent infectivity rate in this work since its main objective is to compare the model with distributed recovery and death rates with the delay model.
We compared the final size of the epidemic and maximal number of infected obtained in the Formulas (15), (21), and (23), respectively. Note that these formulae depend on τ and β. Since the outbreak due to Omicron is the only one when the social distancing measures such as lockdowns or isolation were not strictly imposed, we can assume that the transmission rate β is approximately constant. We took the data of the Omicron outbreak in Italy from [35] from 10 November 2021 to 10 December 2021, and we fit the delay model to these 40 days of data and estimated the disease transmission rate β as 0.118. We took the value of disease duration τ = 11 days, which was obtained by using the method discussed in Section 5. Then, using the formula (15), we calculated the final size of the epidemic as 3.387 × 10 7 . Using the formulae (21) and (23), we calculated the maximal number of infected as I m = 3.4139 × 10 6 , whereas the maximal number of infected was I m = 2.7317 × 10 6 in the data. Thus, the formulae (21) and (23) give a reasonable estimate of the maximal number of infected. Similarly, we obtained an accurate estimate of the maximal number of infected for some other countries (not shown).
Let us finally note that the delay model presented in this work is simple and generic. It describes epidemic progression with two parameters β and τ, which can be easily estimated from the data. Our next goal will be to apply the proposed modeling approach to multicompartment models consisting of different groups of susceptible and/or infected and to immuno-epidemic models with time-varying recovery and death rates [32]. It is also interesting to check the applicability of the proposed model to other transmissible diseases.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Review of Delay Models
Time delay in epidemic models accounts for the delay for an individual to become infectious after being infected. Time delay for vector-born diseases characterizes the time needed for the pathogen to reach a certain threshold sufficient for infection transmission. Under the assumption that the infection transmission rate at time t depends on the number of infected at time t − τ and the number of susceptible at time t, the delay SIR model for vector-born diseases is given by the following system of equations (see [41] and the references therein): where S(t) represents susceptible and I(t) infectious compartments, the function f (S(t), I(t − τ)) characterizes the disease transmission rate, δ is the clearance rate, and τ is the time delay from the moment of infection to disease transmission. Another type of delay model describes temporary immunity [28]: where ω is the time period after which an infected individual becomes susceptible again. For long-lasting epidemics such as HIV, new generations of susceptible individuals influence epidemic dynamics. In this case, time delay corresponds to the maturation period after which young adults become susceptible to infection (see [42][43][44] and the references therein): Here, N(t) = S(t) + I(t), g(N(t − τ)) is the susceptible recruitment function, which incorporates the maturation delay τ and h(S, I) is the disease transmission rate.

Appendix B. Positiveness of the Delay Model (9)
We show the positiveness of the solution of the delay model (9). Note that, as per our assumption, J(t) is a positive function. From (9a), we observe that, if S(t * ) = 0 at some time point t * , then dS dt | t=t * = 0. This implies that S(t) ≥ 0 for t > 0. From (3c) and (3d), we obtain that R(t) and D(t) are both increasing functions, and hence, R(t) and D(t) also remain positive for all t. Next, we prove that I(t) > 0. Integrating (9c) from 0 to t, we obtain Since, J(t) = 0 for t < 0, we can write: Therefore, I(t) is positive for all t. This shows the positiveness of the solution of System (9).