# Evolution Model for Epidemic Diseases Based on the Kaplan-Meier Curve Determination

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Material and Methods

`incidence`of R—the software that we use, is explained in Reference [13]. Some specific applications of epidemic modelling using this framework to the current Covid-19 crisis have been already published (see for example Reference [20]).

- (1)
- There are individuals who test negative for infection but are still able to spread the disease to others. So they should be considered active from the point of view of the virus—as an infected individual—for at least a fixed period of time N.
- (2)
- There are cases that reappear as infected after being counted as recovered individuals.

#### 2.1. Latent Cases and Resuscitation Rate

#### 2.2. Cumulative Function and Complete Model

#### 2.3. Dynamic Estimate of the Number of Post Infection Individuals

#### 2.4. Probabilistic Model for the Evolution of a Viral Epidemic Process Based on the Kaplan-Meier Curve

- Let $I:[0,\infty )\to \mathbb{R}$ be an integrable function representing the new cases of sick people: $I\left(t\right)$ represents the number of new patients in the model introduced at the moment $t.$
- Let $E:[0,\infty )\to \mathbb{R}$ be other integrable function representing the cases that are out of the process at the time t of sick people: $E\left(t\right)$ is the sum of the dead at that time t plus recovered patients. We can use exponential expressions ${\sigma}^{t}$ as the ones explained above
- The function $\mathcal{P}:[0,\infty )\to [0,1]$ explained above, that represents the probability at the time t of survival of the virus—that is, the probability that a confirmed individual will continue to be infected.
- The formula that gives the relation among these terms is then$$E\left(s\right)=(I*\mathcal{P})\left(s\right)={\int}_{[0,s]}I\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathcal{P}(s-t)\phantom{\rule{0.166667em}{0ex}}dt.$$As we have shown in the previous development of our formalism, in this paper we use the discrete version of this formula. That is, $dt$ is the counting measure. The general model presented by the convolution formula could be used when the entry of new cases can occur at any time, and is not necessarily entered daily. In this case, continuous variables and functions over the Lebesgue measure space seem to be more convenient. But note that the formula describing the model is essentially the same.Note that for some relevant cases we could be interested in (for example, counts of individuals provided by all the countries in the world with Covid-19) we cannot assume that all the individuals that are counted as confirmed (I) are controlled in the process (i.e., some of them were not at any hospital or passed away). In other words, the equation $I=M+F$—for M being the dead and F the recovered people—cannot be assumed to hold at the end of the epidemic process in general. So, due to the lack of correct information, we could have infected individuals who have been detected but are not controlled by the health systems. Therefore, we have to consider another (not determined) parameter $0<\gamma \le 1$ that represents this fact, such that the balance equation becomes $I\phantom{\rule{0.166667em}{0ex}}\gamma =M+F,$ and so $\gamma $ is given by$$\gamma =1-\underset{s\to \infty}{lim}\mathcal{P}\left(s\right)\in [0,1).$$However, in the rest of the paper we will assume that—having no other source of information—at the end of the process we have that all individuals who were counted as infected have been counted as recovered individuals or the dead.

#### 2.5. Least Squares Fitting of the Model

#### 2.6. A Direct Estimate of the Associated Probabilities

**Step 1.**Define the Lagrangian function

**Step 2.**In order to do it note that

**one**$N\le k\le s,$ removing the $k$-th equation in the system above when ${y}_{k}=0$ is assumed, and solving again the system. We get in this case one equation and one variable less. The rest of the optimal values have to be $\ge 0$ too. Now we have to solve all the systems of equations that appear following this rule. If there is at least a solution, we have to compute all of them and compare the errors. The one with the smallest error is the right one; in case there are more than one with the smallest error, we take the means.

**Step 3.**As we said, the result computed in the previous steps depends on ${\mu}_{s}.$ Consequently, for a consistent fitting of the experimental data we need to compute the better parameter $0\le {\mu}_{s}\le 1.$ In order to do it, we consider the associated error $\epsilon \left(s\right)$ written above, which in fact depends on ${\mu}_{s},$ that is $\epsilon \left(s\right)$ has to be changed by ${\epsilon}_{{\mu}_{s}}\left(s\right).$ The final solution is then given by solving the optimization problem

#### 2.7. Incorporating Karush-Kuhn-Tucker Conditions

**Step 1.**If $\mu >0,$ we have that ${\sum}_{k=N}^{s}{\alpha}_{k}^{2}=1,$ and the problem reduces to the case given in Section 2.6, for ${\mu}_{s}=1.$ After getting the solution, we have to check that $\mu >0.$

**Step 2.**If $\mu =0,$ we have that the system is given by the equations

#### 2.8. Functional Estimate of the Survival Model

## 3. Results: Computational Methods for Estimating the Kaplan-Meier Curve

#### 3.1. Monte Carlo Direct Approach

`### Define the error function using matrix J and vector E``ErrorVec<-function(v){norm((J)%*%v-E, type="2")}``### Fix the expo parameter``expo<-10``### Starting variables``mc<-c(1:24)``mcfin<-c(1:24)``ermc<-100000000``for(k in 1:1000000){``mc0<-runif(24, min=0, max=1)``mc<-(mc0/sum(mc0))*(runif(1, min=0, max=1))^(1/expo)``if(ErrorVec(mc) > ermc ){mc<-c(1:24)*0}``ermc<-min(c(ermc, ErrorVec(mc)))``if(ErrorVec(mc)<=ermc){mcfin<-mc}``}`

`mcfin`in the script—written in the natural order are presented in Table 6 (we write only 4 digits).

#### 3.2. Sampling on the Configurations of Local Minima of the Optimization Problem Using Karush-Kuhn-Tucker Conditions

**Step 2**in the algorithm of Section 2.6). The algorithm can be divided into two cases:

`QQ0`,

`QQ1`and

`ult0`) that allow to write and solve the linear systems with the necessary requirements explained in the resolution method. This is done by using the script:

`QQ0<- matrix(nrow = s, ncol = s, byrow = FALSE, dimnames = NULL)``for(j in 1:s){``for(k in 1:s){``if(k>j+1 | k<j ) {``QQ0[j,k]<-0``} else {if(j==k & j<s){QQ0[j,k]<-1} else{QQ0[j,k]<--1}}``QQ0[s,s]<-0``}``}``QQ1<- matrix(nrow = s, ncol = s, byrow = FALSE, dimnames = NULL)``for(j in 1:(s-1)){``for(k in 1:s){``QQ1[j,k]<-0``QQ1[s,k]<-1``}``}``ult0<-rep(0,s)``ult0[s]<-1`

`### Exact solution without removing any equation:``Z<-solve(QQ0%*%t(J)%*%J+QQ1,QQ0%*%t(J)%*%E+ult0)``### Adapted functions providing the solution and the error``### when the equations labeled by the vector w are removed.``Sol1<-function(w){``m<-c(1:length(w))``A<-QQ0[-m,-m]%*%((t(J)%*%J)[-w,-w])+QQ1[-m,-m]``b<-QQ0[-m,-m]%*%(t(J)%*%E)[-w]+ult0[-m]``Sol1<-solve(A,b)}``Error1<-function(w){norm((J[,-w])%*%Sol1(w)-E, type="2")}`

`er<-100000``Sol2<-c(1:s)``for(q in 1:(s-2)){``for(k in 1:10000){``w<-c(sample(1:s, q,replace=F))``cc<-0``for(u in 1:(s-q)){``if(Sol1(w)[u] >= -0.00001)``cc<-cc+1``}``if( cc==(s-q) & er >= Error1(w))``{ er <- Error1(w)``Sol2<-Sol1(w)``w1<-w``}``}``}``### ErrorCaseEqual1 is the error for this case``ErrorCaseEqual1<-er`

`er<-100000``for(q in 1:(s-2)){``for(k in 1:10000){``w<-c(sample(1:s, q,replace=F))``if(sum(Sol(w))<=1){``cc<-0``for(u in 1:(s-q)){``if(Sol(w)[u] >= -0.00001)``cc<-cc+1``}``if( cc==(s-q) & er >= Error(w))``{ er <- Error(w)``### Sol0 is the solution``Sol0<-Sol(w)``### w0 indicates which are the equations that have to be removed``w0<-w``}``}``}``}``### ErrorCaseLess1 is the error for this case``ErrorCaseLess1<-er`

#### 3.3. Genetic Algorithms

`fitness`function [24]. We get profit on both the good results obtained with GAs, together with their capability to handle a wide variety of problems with different degrees of complexity, what explains their wide use. Our GA have been designed for getting an approximate solution to the problem by defining a new error that balances the error $\epsilon \left(s\right)$ and the estimate of the cumulative sum $|{\sum}_{k=1}^{s}{\alpha}_{k}^{2}-{\mu}_{s}|,$ where ${\mu}_{s}$ can be handled to improve the result using additional information, starting for example with ${\mu}_{s}=1.$ We define as a

`fitness`function:

`popSize`) that represents the number of possible solutions that the algorithm evaluate—with the fitness function—in each iteration and the total number of iterations (

`maxiter`). The GA algorithm progress applying the genetic operators (crossover and mutation) to the members of the population to produce the offsprings that will form part of the population in the next iteration ( [24]). We have considered values of the

`popSize`of 100, 250 and 500 in combination with

`maxiter`that takes values equal to 5000, 250,00 and 50,000. In Table 8 we can see values of the relative error, defined as the norm of the difference between the approximated solution obtained with GA and the solution obtained in Section 3.2. The bigger the number of iterations and the population size, the lower the error.

`maxiter`= 50,000 and

`popSize`$=250$. CPU time in a Macbook 2015 (Dual-Core Intel Core i5 2,7 GHz) with 8GB of memory laptop takes less than 30 min. For these values fixed, we have study for this particular case the value of the quotient ${\gamma}_{1}/{\gamma}_{2}$ for which we obtain the best approximated solution. This has been obtained for a value ${\gamma}_{1}/{\gamma}_{2}=9$. The solution obtained can be seen in Figure 8 and Figure 9. It can be observed that the approximation is very good, being the error very small. This value of the quotient means that, in this particular case, the first part of the error function is much more important that the second part relative to the condition $|{\sum}_{k=1}^{s}{\alpha}_{k}^{2}-{\mu}_{s}|$. Anyway we want to point out that the extreme values ${\gamma}_{1}=1.0$ and ${\gamma}_{2}=0$—that is possible to implement numerically—do not produce an acceptable solution or, in other words, the condition $|{\sum}_{k=1}^{s}{\alpha}_{k}^{2}-{\mu}_{s}|$ is necessary for obtaining a good approximation. Finally, this is not the general situation and, with real data from Covid-19 ([27]) the best fitting is obtained with much more balanced quotient, being the usual value ${\gamma}_{1}/{\gamma}_{2}=1$.

`library("GA")``### Built matrix J with Data.``s<-23 ### Dimension of the problem``### Define the error function using matrix J and vector E``ErrorVec<-function(v){norm((J)%*%v-E, type="2")}/norm(E)``### Vector v is the solution of the problem``### Define the fitness function``fitness <- function(x) 1/(g1*ErrorVec(v)+g2*abs(sum(v)-mu_s))``### Starting variables``mu_s<-1``g1<-1``g2<-1``### Define upper and lower bounds for the values``### of the components of vector v``lb <- rep(0,s)``ub <- rep(1,s)``### Finally with fixed labels monitor=’FALSE" to``### avoid the printscreen.``### See documentation for more details.``Ga <- ga(type = "real-valued", fitness = fitness,``lower=lb, upper = ub,``popSize=250, maxiter=50000, monitor=’FALSE’,``seed=123456)``### Solution is save in variable Sol``Sol<-c([email protected])`

## 4. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Ai, T.; Yang, Z.; Hou, H. Correlation of Chest CT and RT-PCR Testing in Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases. Radiology
**2020**, 200642. [Google Scholar] [CrossRef][Green Version] - Chen, D.; Xu, W.; Lei, Z.; Huang, Z.; Liu, J.; Gao, Z.; Peng, L. Recurrence of positive SARS-CoV-2 RNA in COVID-19: A case report. Int. J. Infect. Dis.
**2020**, 93, 297–299. [Google Scholar] [CrossRef] [PubMed] - Monto, A.S.; Cowling, B.J.; Peiris, J.S.M. Coronaviruses. Viral Infect. Hum. Epidemiol. Control
**2014**, 199–223. [Google Scholar] [CrossRef] - Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc.
**1958**, 53, 457–481. [Google Scholar] [CrossRef] - Kenah, E. Contact intervals, survival analysis of epidemic data, and estimation of R
_{0}. Biostatistics**2011**, 12, 548–566. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kenah, E. Non-parametric survival analysis of infectious disease data. J. R. Soc. Ser. B (Stat. Methodol.)
**2013**, 75, 277–303. [Google Scholar] [CrossRef] [PubMed][Green Version] - Ogluszka, M.; Orzechowska, M.; Jedroszka, D.; Witas, P.; Bednarek, A.K. Evaluate Cutpoints: Adaptable continuous data distribution system for determining survival in kaplan-meier estimator. Comput. Methods Programs Biomed.
**2019**, 177, 133–139. [Google Scholar] [CrossRef] [PubMed] - Brauer, F. Compartmental models in epidemiology. In Mathematical Epidemiology; Springer: Berlin/Heidelberg, Germany, 2008; pp. 19–79. [Google Scholar]
- Choisy, M.; Guegan, J.F.; Rohani, P. Mathematical modeling of infectious diseases dynamics. In Encyclopedia of Infectious Diseases: Modern Methodologies; Tibayrenc, M., Ed.; John Wiley & Sons: Hoboken, NJ, USA, 2007; Chapter 22; pp. 379–404. [Google Scholar] [CrossRef][Green Version]
- Hethcote, H.W. The mathematics of infectious diseases. SIAM Rev.
**2000**, 42, 599–653. [Google Scholar] [CrossRef][Green Version] - Silal, S.P.; Little, F.; Barnes, K.I.; White, L.J. Sensitivity to model structure: A comparison of compartmental models in epidemiology. Health Syst.
**2016**, 5, 178–191. [Google Scholar] [CrossRef][Green Version] - Kermack, W.O.; McKendrick, A.G. A Contribution to the Mathematical Theory of Epidemics. Proc. R. Soc. A
**1927**, 115, 700–721. [Google Scholar] - Kamvar, Z.N.; Cai, J.; Pulliam, J.R.; Schumacher, J.; Jombart, T. Epidemic curves made easy using the R package incidence. F1000 Res.
**2019**, 8, 139. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bailey, N.T. The Elements of Stochastic Processes with Applications to the Natural Sciences; John Wiley & Sons: New York, NY, USA, 1990; Volume 25. [Google Scholar]
- Bastin, G. Lectures on Mathematical Modelling of Biological Systems. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.465.8665&rep=rep1&type=pdf (accessed on 27 March 2020).
- Keeling, M.J.; Danon, L. Mathematical modelling of infectious diseases. Br. Med. Bull.
**2009**, 92, 33–42. [Google Scholar] [CrossRef] [PubMed] - Brown, G.D.; Oleson, J.J.; Porter, A.T. An empirically adjusted approach to reproductive number estimation for stochastic compartmental models: A case study of two Ebola outbreaks. Biometrics
**2016**, 72, 335–343. [Google Scholar] [CrossRef] [PubMed] - Huppert, A.; Katriel, G. Mathematical modelling and prediction in infectious disease epidemiology. Clin. Microbiol. Infect.
**2013**, 19, 999–1005. [Google Scholar] [CrossRef] [PubMed][Green Version] - Paul, M. Foreseeing the future in infectious diseases: Can we? Clin. Microbiol. Infect.
**2013**, 19, 99–992. [Google Scholar] [CrossRef] [PubMed][Green Version] - Roosa, K.; Lee, Y.; Luo, R.; Kirpich, A.; Rothenberg, R.; Hyman, J.M.; Yan, P.; Chowell, G. Short-term forecasts of the COVID-19 epidemic in Guangdong and Zhejiang, China: February 13–23, 2020. J. Clin. Med.
**2020**, 9, 596. [Google Scholar] [CrossRef] [PubMed][Green Version] - Jiang, H.; Fine, J.P. Survival analysis. In Topics in Biostatistics; Humana Press: Totowa, NJ, USA, 2007; pp. 303–318. [Google Scholar]
- Kleinbaum, D.G.; Klein, M. Survival Analysis; Springer: New York, NY, USA, 2010. [Google Scholar]
- Nocedal, J.; Wright, S. Numerical Optimization; Springer Science & Business Media: Berlin, Germany, 2006. [Google Scholar]
- Yu, X.; Gen, M. Introduction to Evolutionary Algorithms; Springer-Verlag: Berlin, Germany, 2010. [Google Scholar]
- Scrucca, L. Package ‘GA’-CRAN-R Project. Available online: https://luca-scr.github.io/GA/ (accessed on 27 March 2020).
- Scrucca, L. GA: A Package for Genetic Algorithms in R. J. Stat. Softw.
**2013**, 53. [Google Scholar] [CrossRef][Green Version] - Calabuig, J.M.; García-Raffi, L.M.; García-Valiente, A.; Sánchez-Pérez, E.A. Kaplan-Meier type survival curves for COVID-19: A health data based decision-making tool. arXiv
**2020**, arXiv:2005.06032. [Google Scholar]

**Figure 1.**Daily data of new confirmed individuals (black line) and the new dead (red line). The dashed lines indicate the corresponding average. (Figures in HTML format are in Supplementary Materials).

**Figure 2.**Simulation of the daily hospital discharges given by the function F. The dashed line is the average and the mark point is the maximum.

**Figure 3.**Actual data of the cumulative sum function E, that is, the cumulative sum of the dead plus recovered individuals. The dashed line corresponds to the average.

**Figure 4.**Representation of function E consisting of cumulative recovered individuals plus the dead (black line) and its Monte Carlo approximation (red line). The bar graph is the pointwise error with the maximum (red point) and the average (dashed line).

**Figure 5.**Joint representation of the data in E (red line) and its approximation for the restriction case $sum=1$ (black line). The bar graph is the pointwise error with the maximum (red point) and the average (dashed line).

**Figure 6.**Joint representation of the data in E (red line) and its approximation for the restriction case $sum<1$ (black line). The bar graph is the pointwise error with the maximum (red point) and the average (dashed line).

**Figure 7.**Representation of the survival function estimates for the three methods. The red line is the crude Monte Carlo estimate (big Error 422.7197), the black one is the genetic algorithm result, and the red one is the exact approximation based on the Karush-Kuhn-Tucker conditions. The errors for this last/better method are 89.7183—when the constraint $sum=1$ is assumed, and 131.0098 for the case $sum<1.$

$I\left(s\right)$ | number of new infections at time s |

$M\left(s\right)$ | number of death patients at that time s |

$F\left(s\right)$ | number of patients considered recovered at time s |

**Table 2.**Data record of infected people and the dead of the first 24 days of the Covid-19 epidemics in Spain (natural order from left to right). Data-source: https://github.com/datadista/datasets/tree/master/COVID%2019.

Infected | 16 | 12 | 22 | 48 | 36 | 48 | 39 | 128 | 65 | 159 | 410 | 623 |

506 | 822 | 1259 | 1544 | 2000 | 1438 | 1987 | 2538 | 3431 | 2833 | 4946 | 3646 | |

Dead | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 3 | 9 | 0 | 18 | 12 |

37 | 36 | 16 | 152 | 21 | 182 | 107 | 169 | 235 | 324 | 394 | 462 |

**Table 3.**Simulation of the recovered patients provided by the function $F\left(t\right)$ (natural order from left to right).

Recoreverd | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.62 | 1.08 | 1.87 |

3.23 | 5.54 | 9.48 | 16.17 | 27.54 | 46.85 | 79.61 | 135.15 | 229.31 | 478 | 540 | 450 |

**Table 4.**Cumulative sum of the addition of the recovered individuals and the dead (natural order from left to right).

Recovered and Dead | 0 | 0 | 0 | 0 | 0 | 2 | 4 | 7 |

16 | 16.62 | 35.7 | 49.58 | 89.8 | 131.34 | 156.82 | 324.99 | |

373.54 | 602.39 | 789 | 1093.15 | 1557.46 | 2359.46 | 3293.46 | 4205.46 |

expo | 1 | 5 | 10 |

Error | 454.0506 | 431.0416 | 422.7197 |

**Table 6.**Coefficients ${\alpha}_{k}^{2}$ for the best MC solution (natural order from left to rigth).

0.0005 | 0.0013 | 0.0029 | 0.0150 | 0.0288 | 0.02970 | 0.0936 | 0.0865 |

0.0403 | 0.0781 | 0.0707 | 0.0877 | 0.0875 | 0.0328 | 0.0547 | 0.0404 |

0.0259 | 0.0227 | 0.0326 | 0.0322 | 0.0004 | 0.0048 | 0.0273 | 0.0304 |

**Table 7.**First 12 coefficients ${\alpha}_{k}^{2}$ of the survival functions for the cases $sum<1$ and $sum=1$ (the remaining coefficients are 0).

$sum<1$ | 0 | 0 | 0.0046 | 0 | 0 | 0 | 0.1492 | 0 | 0.1535 | 0 | 0.6395 | 0 |

$sum=1$ | 0 | 0.0013 | 0 | 0 | 0 | 0 | 0.1903 | 0 | 0.1216 | 0 | 0.3313 | 0.3555 |

**Table 8.**Relative error between the exact (Section 3.2) and the approximated solution using GAin terms of the maximum number of iterations and the population size.

maxiter | ||||

popSize | 5000 | 25,000 | 50,000 | 100,000 |

100 | $0.42$ | $0.26$ | $0.22$ | $0.18$ |

250 | $0.38$ | $0.23$ | $0.19$ | $0.17$ |

500 | $0.33$ | $0.22$ | $0.18$ | $0.16$ |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Calabuig, J.M.; García-Raffi, L.M.; García-Valiente, A.; Sánchez-Pérez, E.A. Evolution Model for Epidemic Diseases Based on the Kaplan-Meier Curve Determination. *Mathematics* **2020**, *8*, 1260.
https://doi.org/10.3390/math8081260

**AMA Style**

Calabuig JM, García-Raffi LM, García-Valiente A, Sánchez-Pérez EA. Evolution Model for Epidemic Diseases Based on the Kaplan-Meier Curve Determination. *Mathematics*. 2020; 8(8):1260.
https://doi.org/10.3390/math8081260

**Chicago/Turabian Style**

Calabuig, Jose M., Luis M. García-Raffi, Albert García-Valiente, and Enrique A. Sánchez-Pérez. 2020. "Evolution Model for Epidemic Diseases Based on the Kaplan-Meier Curve Determination" *Mathematics* 8, no. 8: 1260.
https://doi.org/10.3390/math8081260