SEIR Modeling, Simulation, Parameter Estimation, and Their Application for COVID-19 Epidemic Prediction †

: In this paper, we consider the SEIR (Susceptible-Exposed-Infectious-Removed) model for studying COVID-19. The main contributions of this paper are: (i) a detailed explanation of the SEIR model, with the signiﬁcance of its parameters. (ii) calibration and estimation of the parameters of the model using the observed data. To do this, we used a nonlinear least squares (NLS) optimization and a Bayesian estimation method. (iii) When the parameters are estimated, we use the models for the prediction of the spread of the virus and compute the probable number of infections and deaths of individuals. (iii) We show the performances of the proposed method on simulated and real data. (iv) Remarking that the ﬁxed parameter model could not give satisfactory results on real data, we proposed the use of a time-dependent parameter model. Then, this model is implemented and used on real data.


Introduction
Mathematical models and computer simulations are useful experimental tools for building and testing theories, assessing quantitative conjectures, determining sensitivities to changes in parameter values, and estimating key parameters from data.Understanding the transmission characteristics of infectious diseases in communities, regions, and countries can lead to better approaches to controlling the transmission of those diseases.Mathematical models are utilized in comparing, planning, implementing, evaluating, and optimizing numerous detection, prevention, therapy, and management programs.Epidemiology modeling can contribute to the design and analysis of epidemiological surveys, recommend crucial data that should be collected, determine trends, make general forecasts, and estimate the uncertainty in forecasts [1][2][3][4].
There exist a number of models for infectious diseases; as for compartmental models, starting from the very classical SIR model to additional complicated proposals [5].Many research works have been reported.They show that those SIS, SIR and SEIR models can replicate the dynamics of various epidemics.These models have also been used to model COVID-19 [6][7][8].To mention a few of them, Tang et al. [9] investigated a general SEIR epidemiological model where quarantine, isolation, and treatment are considered.Wang et al. [10] applied the phase-adjusted estimation for the amount of COVID-2019 cases in Wuhan.
Our main contribution is to consider the SEIR compartmented model and to give the significance of its parameters; methods for the estimation of its parameters and its calibration, and finally, use it for prediction.For the estimation of its parameter, we used two methods: a deterministic nonlinear least squares (NLS) optimization and a Bayesian estimation method.Once the parameters are estimated, we use the model for the prediction of the spread of the virus and calculate the probable number of infections and deaths of individuals.We show the performances of the proposed method on simulated data.Remarking that the fixed-parameter model could not provide satisfactory results on real data, we proposed the use of a time-dependent parameter model.
The rest of this paper is organized as follows.Section 2 describes the construction of the mathematical SEIR model and the comprehension of its parameters.Section 3 focuses on the estimation of the parameters of SEIR using simulated data.In Section 4, the focus is on the application of this model for the available data.We will see how to adapt the parameter estimation to the available data.Remarking that the evolution of the COVID-19 epidemy can not be modeled by a fixed-parameter model, we adopt a time-varying parameter model and adapt it to the available data.In Section 5, we describe the main conclusions.

A Continuous-Time SEIR Models
SEIR is a classical model for the problem of modeling epidemics.This model is composed of four components, shown in Figure 1.we use the model for prediction of the spread of the virus and calculate the probable number of infection and death of individuals.We show the performances of the proposed method on simulated data.Remarking that the fixed parameter model couldn't provide satisfactory results on real data, we proposed to use a time dependent parameter model.
The rest of this paper is organized as follows.Section 2 describes the construction of mathematical SEIR model and comprehension of its parameters.Section 3 focuses on the estimation of the parameters of SEIR using simulated data.In Section 4, the focus is on the application of this model for the available data.We will see how to adapt the parameter estimation to the available data.Remarking that the evolution of the COVID-19 epidemy can not be modelled by a fixed parameters model, we adopt a time varying parameter model and adapt it to the available data.In Section 5, describes the main conclusions.

A continuous-time SEIR models
SEIR is a classical model for the problem of modelling epidemics.This model is composed of four components shown on Figure 1.The dynamic equations relating these comprtments are given by: In this model: • Susceptible (S) is the variety of people susceptible of contracting the infection; • Exposed (E): the variety of exposed which are alive but not infected; • Infectious (I) is the variety of individuals which are infected; • Recovered (R) is the cumulative variety of individuals that recovered from the disease; The total population at time t is represented by N = S + E + I + R. The significance of the parameters are: β is infection rate or the speed of spread, σ is the incubation rate or the rate of latent individuals becoming infectious (average period of incubation is 1/σ) and γ is the recovery rate or mortality rate.
If the duration of infection is D then γ = 1/D.This model has the initial conditions S(0) > 0, I(0) ≥ 0,   The dynamic equations relating these compartments are given by: In this model: The total population at time t is represented by N = S + E + I + R. The significance of the parameters are: β is the infection rate or the speed of spread, σ is the incubation rate or the rate of latent individuals becoming infectious (average period of incubation is 1/σ), and γ is the recovery rate or mortality rate.If the duration of infection is D then γ = 1/D.This model has the initial conditions S(0) > 0, I(0) ≥ 0, E(0) ≥ 0, and R(0) ≥ 0. Note that Figure 2 shows an example simulation of this model.

Calibration of the Epidemic Model
With such a set of differential equations, the mathematical problems to solve are mainly in two categories: forward and inverse problems.The forward problem consists of computing the outputs when the parameters and the initial conditions are fixed.The inverse problems consist of: (i) estimating the parameters when a set of observable data related to the outputs are given; (ii) predict unobserved parts of outputs (future values) from the observed (passed data) when the parameters are estimated.Let us first explain these in a more formal and general mathematical formulation.

Forward Problem
Consider now the following general SEIR ẋ(t) = χ(x(t), θ), The forward problem consists in computing the model response x(t) given the initial conditions x 0 and a set of parameters θ.

Inverse Problem
The inverse problem is the opposite of the forward problem: Given a set of observed data z(t), t = 1, • • • , M, we try to estimate parameter θ, and then, the state vector x(t) for all t, and thus, the unobserved values of z(t), t > M (prediction).There are mainly two approaches to inverse problems: deterministic and probabilistic.In the following, we describe these two approaches through two methods: i) the nonlinear least squares (NLS) and the Bayesian maximum a posteriori (MAP).

Nonlinear Least Squares (NLS) Solution
Figure 3 schematically shows the forward and inverse problems in a deterministic approach.

Calibration of the epidemic model
With such a set of differential equations, the mathematical problems to solve are mainly in two categories: Forward and Inverse problems.The Forward problem consists in computing the outputs when the parameters and the initial conditions are fixed.The Inverse problems consist in: i) estimating the parameters when a set of observable related to the outputs are given; ii) predict unobserved parts of outputs (future values) from the observed (passed data) when the parameters are estimated.Let us first to explain these in a more formal and general mathematical formulation

Forward Problem
Consider now the following general SEIR ẋ(t) = χ(x(t), θ), T is the 4-dimensional state vector; • the initial state x 0 = x(0) is a constant 4-dimensional vector; • θ = [β, σ, γ] is the 3-dimensional vector of the model parameters; The forward problem consists in computing the model response x(t) given the initial conditions x 0 and a set of parameters θ.

Inverse Problem
The inverse problem is the opposite of the forward problem: Given a set of observed data z(t), t = 1, • • • , M, try to estimate the parameter θ, and then, the state vector x(t) for all t, and thus, the unobserved values of z(t), t > M (prediction).There are mainly two approaches to inverse problems: Deterministic and probabilistic.In the following, we describe these two approaches through two methods: i) the Non linear Least Squares (NLS) and the Bayesian Maximum A Posteriori (MAP).

Nonlinear Least Squares (NLS) solution
Figure 3 shows schematically the forward and inverse problems in a deterministic approach.
Forward and inverse problems.Here, we assume that the initial conditions x 0 are known.
The nonlinear least squares method (NLS) is used to numerically to approximate a solution for the inverse problem (3) within the least-squares fitting, where we look for a value θ * of the model parameter θ which minimizes the least squares criterion The nonlinear least squares method (NLS) is used numerically to approximate a solution for the inverse problem (3) within the least-squares fitting, where we look for the θ * value of the model parameter θ, which minimizes the least squares criterion where x(t)[θ] is the output of the model for a given set of parameters θ and z = φ(θ, x(t)) is the observable outputs.Such a problem is clearly a nonlinear least-squares problem, since the dependence of a solution system on the parameter θ is through a highly nonlinear system of differential equations.

Bayesian Estimation Framework
In a Bayesian framework, first, we may consider Equation ( 4) as the log-likelihood of parameters θ in the data: This means that we assume that there is some uncertainty on the observed data z(t) and that the only thing we know about those errors is that they are centered (zero mean), identically and independently distributed with fixed variance v .In fact, this formal prior knowledge is translated by the maximum entropy principle (MEP) to a Gaussian distribution [11].
We may then assign or choose a prior p(θ) to translate any prior knowledge we may have about θ.This can be just flat priors on the parameters or, rather, some reference, noninformative, or conjugate priors.In this work, as all the parameters are the rates (infection rate, incubation rate, and recovery rate), we used flat priors for them.
When the likelihood and the prior are fixed, we can use the Bayesian rule to obtain the posterior law: where J MAP (θ) = J(θ)) + λR(θ) with J(θ) is given by (4), R(θ) ∝ −ln(p(θ)) and λ > 0 is a hyperparameter depending on the variance of the errors v and some parameters of the prior law.
Comparing the expressions in ( 4) and ( 8), we see that the MAP solution can be considered as a regularized solution to the standard NLS solution with λ as the regularization parameter.In our case, as we have chosen flat priors, ln p(θ) = 0, and so the criterion to optimize (8) becomes the same as criterion (4).
There are many other Bayesian estimators, for example, the posterior mean, but its inference needs MCMC sampling methods.The computational cost of such methods is huge for this kind of application.We stand here with the more classical techniques of Chisquared, AIC (Akaike Information Criterion), and BIC (Bayesian Information Criterion) for which standard techniques of computations are, in general, included in many optimization tools available, for example, in Python language packages [12].A Jupyter notebook of all the simulations of this paper will be available after its publication on github.

Adaptation of the Inversion Methods to Available Data
Now we confront the modeling of the previous section with accessible data that can not be a whole [S, E, I, R](t) time series but a subset or alternative combination of them.As we work with the COVID-19 data from the Johns Hopkins University at GitHub [13], the available data are: as cumulative integer-valued time series for days from 22 January 2020.Note that these values are absolute numbers, not relative to a total population.Note that the unconfirmed cases and also the Susceptibles are not accessible at all, whereas the Confirmed contain the Dead and the Recovered from earlier days.For example, using the Johns Hopkins data, they supply C, D, and R individually while not stating the most necessary variables We currently use a model that works with the Johns Hopkins data while not having the ability to use S in the SEIR model.Since the SEIR model does not distinguish between recoveries and deaths, we set the following relations: Now, returning to the SEIR model, we see the subsequent relations: defining time series γ n and q n that model γ and q = βS Model /N without knowing SEIR.This is equivalent to the model Then, for the estimation of the parameters with its adaptation Figure 4 indicates the conversion from S-E-I-R to C-D-R.Numerous extensions are supplied to enhance it.The primary extension of the model with a further block is utilized to account for the specific fact that the whole number of the population is not fixed [7,[14][15][16][17][18][19][20][21][22].The second categories propose time-dependent parameters [23][24][25][26].

Constructing Predictions
To use the proposed model to perform prediction, the process we propose is, first estimate the parameters of the model from the real available data using the method presented in Figure 5 using either the nonlinear least squares or Bayesian methods.Then, once the model has been identified from the data, we can use it for prediction.To be more precise, we first consider that we have real data over the horizon 1, . . ., T, where 1 denotes the first day of the estimation period and T the last one.We can initialize the state of the system with x(t, t 0 = 1) from the real data.Then, with an initial value for the parameters, we can compute x(t > t 0 ) = [S(t), E(t), I(t), R(t)], then z(t) = [C(t), D(t), R(t))] and finally the value of criterion J(θ), which can then be iteratively optimized to obtain the final estimation of the parameters.When the parameters are obtained, we can use them to perform any prediction of state x(t) and then z(t) for any time interval, particularly for any t > T.

Analysis of COVID-19 Data for France
Daily overall incidences of COVID-19 cases were collected from information sources on github.The records showed the simplest date of the first positive Corona test, date of release from hospital, and typically, date of first negative Corona test.Cases that did not arise via transmission started at t = 0; and cases that arose via transmission were marked on the day of the first positive Corona test.The optimal parameter values obtained using the procedure given in Section 3.3 are β = 6.52731360, σ = 0.00304030, γ = 0.57801349.In Figure 5, we present the results of simulations of the variety of S, I, and R listed through time for France with a fixed-parameter model.
However, the fixed-parameter model could not give satisfactory results on real data, so we used a time-dependent β parameter model.Specifically, a straightforward periodic function β(t): Parameter a denotes the baseline or average transmission rate, ω is the period of the forcing, and b represents the amplitude of the disease.
In Figures 5 and 6, we present the results of prediction of the variety of S, I, and R listed through time for France with a constant and a varying parameter model.Then, we used Equation ( 9) to perform the conversion of S-E-I-R to C-D-R. Figure 7 indicates this conversion.

Conclusions and Discussion
The SEIR model we used here appears to be useful for understanding the propagation of the virus.Two main difficulties appeared: • First, the available data (C,D,R) could not determine in a unique way the S,E,I,R.In fact, going from (S,E,I,R) to (C,D,R) is unique, but the inverse is not.Therefore, we had to be careful when directly using the original SEIR model.We explained this in the previous section when we wanted to estimate the parameters of the model from the real (C,D,R) data.

•
The second difficulty appeared when we saw that the fixed parameters model could not predict the real data well.This is explained due to the fact that this simple model does not account for many real situations.In particular, a fixed-parameter model does not correspond to reality.Many works have been performed to overcome these difficulties.Between them, we chose a simple method in which parameter β varies over time.We used a periodic time dependent for this parameter, as it was also used by other researchers [23][24][25][26].Finally, using appropriate values for the parameters a and b of this time-dependent model, we could obtain good predictions with real data.

Figure 1 .
Figure 1.The four components of the SEIR model and their relations.

Figure 2
Figure 2 shows an example of simulation of this model.

Figure 1 .
Figure 1.The four components of the SEIR model and their relations.

•
Susceptible (S) is the variety of people susceptible to contracting the infection; • Exposed (E) the variety of individuals exposed who are alive but not infected; • Infectious (I) is the variety of individuals that are infected; • Recovered (R) is the cumulative variety of individuals that recovered from the disease.

Figure 3 .
Figure 3. Forward and inverse problems.Here, we assume that the initial conditions x 0 are known.

Figure 4 .
Figure 4. Conversion from S-E-I-R to C-D-R.This straightforward model has at least two drawbacks: • It assumes that the number of the population N does not change during the epidemic; • It does not account for possible context modification of the sickness (fixed parameters during the whole analysis).

Figure 6 .
Figure 6.Prediction results with the SEIR model (constant parameters).

Figure 7 .
Figure 7. Left: Conversion from estimated S-E-I-R to C-D-R (time-varying parameter).Right: Conversion from S-E-I-R to C-D-R on real data.