Physics-Informed Neural Networks and Functional Interpolation for Data-Driven Parameters Discovery of Epidemiological Compartmental Models

: In this work, we apply a novel and accurate Physics-Informed Neural Network Theory of Functional Connections (PINN-TFC) based framework, called Extreme Theory of Functional Connections (X-TFC), for data-physics-driven parameters’ discovery of problems modeled via Ordinary Differential Equations (ODEs). The proposed method merges the standard PINNs with a functional interpolation technique named Theory of Functional Connections (TFC). In particular, this work focuses on the capability of X-TFC in solving inverse problems to estimate the parameters governing the epidemiological compartmental models via a deterministic approach. The epidemiological compartmental models treated in this work are Susceptible-Infectious-Recovered (SIR), Susceptible-Exposed-Infectious-Recovered (SEIR), and Susceptible-Exposed-Infectious-Recovered-Susceptible (SEIRS). The results show the low computational times, the high accuracy, and effectiveness of the X-TFC method in performing data-driven parameters’ discovery systems modeled via parametric ODEs using unperturbed and perturbed data.

Deterministic models are the simplest, with fixed input variables. They are also known as compartmental models because the individuals in the population are assigned to different subgroups, or compartments, each of which represents a specific condition of the individual in the epidemic situation [18]. Derivatives in time are used to express the transition rates of individuals from a compartment to another. Thus, the model is constructed as a system of Ordinary Differential Equations (ODEs).
Stochastic models take into account variations in input variables and provide results in terms of probability. Unlike a deterministic one, a stochastic model allows random variations in one or more inputs over time. Therefore, an estimation of the probability distributions of the outcomes can be carried out. Specifically, the variables changing in time can be the exposure risk, recovery rate, and other disease dynamics. Being able to insert the variability of the input data, the stochastic models have a more complex structure than the deterministic ones but manage to be more adherent to reality [19]. A second categorization

Physics-Informed Neural Network and Functional Interpolation
PINNs are machine learning methods that include physics into a data-driven functional representation of input-output pairing collections. As defined by Raissi et al. [24], the term PINN describes NNs that embed the physics as a regularization term in the loss function. For instance, suppose that one aims to do a regression of an experimental dataset employing an NN and that the collected data represents some physical events modeled via a set of DEs. In conventional regression, one would approximate the data utilizing a NN trained to minimize a Mean Squared Error (MSE) as a loss function. Nevertheless, there is no guarantee that the physics phenomena governing the dataset would not be violated. To mitigate this issue, PINNs are introduced to ensure that the physics, modeled via DEs, is added as a penalty to the loss function. This extra term serves as a regularizator that penalizes the training when the DE and its constraints (e.g., Boundary Conditions BCs, and eventually Initial Conditions ICs) are violated. Thus, one can guarantee that the physics underlying the process is not violated. This method is defined as a data-physicsdriven solution of DEs. More specifically, from the physics perspective, this approach enables the training of NNs to learn the solution of DEs in a data-physics-driven fashion. This becomes critical if the DEs do not precisely model the physics of the problem, for example, when uncertain dynamical systems are considered and/or when perturbations are nonnegligible. Conversely, when the purpose is to retrieve parameters governing some physical phenomena modeled via DEs (e.g., single scattering albedo in the radiative transfer equation), one usually refers to data-physics-driven parameters discovery of DEs (i.e., inverse problems) [24]. When data is not available, and consequently the loss function contains the residual of the DEs and its constraints solely, PINNs learn the solutions of problems involving DEs only in a physics-driven fashion.
The major shortcoming of the standard PINNs, as presented by Raissi et al. [24], is that the DE constraints are not analytically satisfied, and consequently, they need to be concurrently learned with the DE solution within the domain. Hence, during the PINN training, we deal with competing objectives: learning the DE hidden solution within the domain and the DE hidden solution on the boundaries. This drives to unbalanced gradients during the network training via gradient-based techniques that prompt PINNs to struggle frequently to accurately learn the underlying DE solution [34]. Gradient-based optimization methods may get stuck in limit cycles or diverge if several competing objectives are present [35,36]. In [34], to surmount this issue, the authors proposed a learning rate annealing algorithm that uses gradient statistics to adaptive assign proper weights to different terms (e.g., DE residuals within the domain and DE residuals on the boundaries) in the PINNs loss function during the training. In this work, we propose to employ a different and more robust PINN model, the Extreme Theory of Functional Connections (X-TFC), that merges NNs and the Theory of Functional Connections (TFC) [23,37]. X-TFC exploits the constrained expressions (CEs) introduced within the TFC to satisfy the boundary constraints analytically.
TFC, elaborated by Mortari [25], is a mathematical framework for functional interpolation where functions are approximated using these CEs. A CE is a functional that is a sum of a free function and a functional that analytically satisfies the constraints despite the choice of the free function [25,38]. TFC has multiple applications. Primarily, TFC is used for the solution of DEs because the CEs eliminate the "curse of the equation constraints" [39][40][41]. Moreover, TFC has already been used to solve different classes of optimal control space guidance problems such as energy optimal landing on large and small planetary bodies [42,43], fuel optimal landing on large planetary bodies [44], energy optimal relative motion problems subject to Clohessy-Wiltshire dynamics [45], and classes of transport theory problems, such as radiative transfer [46] and rarefied-gas dynamics [47]. For tackling DEs, the standard (or Vanilla as defined in [48,49]) TFC method employs a linear combination of orthogonal polynomials, such as Legendre or Chebyshev polynomials [39,40], as a free function. However, using orthogonal polynomials as a free function makes the standard TFC framework suffer from the curse of dimensionality, particularly when solving large-scale PDEs. To overcome this limitation, X-TFC employs a shallow NN trained via the Extreme Learning Machine (ELM) algorithm [50] to represent the free function.
Being a PINN method, X-TFC can solve forward and inverse problems involving parametric DEs with high precision and low computational time. The method for solving direct problems involving parametric DEs is introduced and presented by Schiassi et al. [23]. As previously stated, the focus of this work is to apply the X-TFC for data-driven parameters discovery of compartmental epidemiological models such as SIR, SEIR, and SEIRS. In the remainder of this section, we will explain how the X-TFC is applied to tackle these problems. Such models are systems of ODEs, where the constraints are on the initial values of these systems' solutions. That is, these problems are initial value problems (IVPs). Therefore, in this section, we will also present the step-by-step derivation of the constrained expressions for these problems.

Generality on Neural Networks
Neural Networks (NNs) are one of the key components of the X-TFC framework that will be used to tackle the problems considered in this work. Therefore, for the convenience of the reader, before diving into the detailed explanation on how X-TFC works, we will give some generalities about NNs.
NNs are powerful mathematical tools, inspired by the biological neurosystems of the human brain, originally developed as function approximators for machine learning applications [51][52][53].
NNs are made by artificial neurons and their mutual connections. The output of every neuron is a non-linear function of the weighted sum of its inputs. The neurons are typically arranged into layers, whose number defines the type of NN. NNs with only a single layer of neurons are known as single-layer or shallow NNs. NNs with more than one layer of neurons are called Deep NNs (DNNs). Neurons and layers are not necessarily all connected among them. When all neurons and layers are connected, we generally talk about fully connected NNs (shallow or deep depending on the number of layers). Every layer of a fully connected NN can be mathematically represented as follows where W is the weight matrix, x and z are the input and output vectors, respectively, b is the bias vector, and σ(·) is the activation function, which can be either different or the same for every layer. As previously stated, NNs were originally introduced as function approximators thanks to their ability in interpolation and fitting. An NN function approximator works in a supervised manner, where the training set T = {(x i , y i )} i=1,...,N consists in x i input points and y i output points (which can be affected or not by noise). First, a trial function y(x) is randomly initialized and the loss function can be defined as The training process consists in solving an optimization problem, where the loss function (L) is the objective function that needs to be minimized, and the decision variables are the weights and biases of each layer. Usually, the training is performed via stochastic gradient based methods such as Adam Optimizer [54].

Extreme Theory of Functional Connections (X-TFC)
In this work, we will focus on systems of ODEs (SODEs) used to describe epidemiological compartmental models. In general, we can express ODEs, in their implicit form as, subject to constraints given by initial conditions (IC) and/or boundary conditions (BC). In Equation (3), f = f (x; λ(x)) is the unknown (or latent) solution, with x ∈ D ⊆ R, λ = λ(x) ∈ L ⊆ R m are the parameters governing the ODE (In general, even if it is not reported in the notation, f is a function of x, and it is parametrized by λ, that in general can be x dependent as well.), N [·; λ] is a linear or non-linear operator acting on f and parametrized by λ, ε is the modeling error that is negligible when solving problems where the physics is exactly modeled by the underlying DE, and R is a known term that in general can be x dependent and parametrized by λ as well. The first step in our PINN-TFC based framework is to approximate the latent solution f with a constrained expression, defined within the TFC [25], where A(x; λ) analytically satisfies the DE constraints, and B(x, g(x); λ) projects the free function g(x), which is a real valued function, onto the space of functions that vanish at the constraints [37]. In the X-TFC method we chose the free function, g(x), to be a shallow NN, trained via ELM algorithm [50]. That is, where L is the number of hidden neurons, w j ∈ R is the input weights vector connecting the jth hidden neuron and the input nodes, ξ j ∈ R with j = 1, ..., L is the jth output weight connecting the jth hidden neuron and the output node, and b j is the bias of the jth hidden neuron, σ(·) are activation functions, and σ = [σ 1 , · · ·, σ L ] T . According to the ELM algorithm [50], biases and input weights are randomly selected and not tuned during the training, thus they are known hyperparameters. The activation functions, σ(·), are also known, as they are user selected. Thus, the only unknown NN hyperparameters to compute are the output weights ξ = [ξ 1 , · · ·, ξ L ] T . Hence we can write, The step-by-step process to derive the constrained expression is provided in Section 2.2.1. Once f is approximated with a NN, the second step of the X-TFC method is to define the loss functions, where f DATA are the real data that eventually can be perturbed. Once the losses are defined, we need to define the vectors with all the unknowns, that are the ξ coefficients and the parameters governing the equations λ, Now, by combining the losses, an augmented loss function vector is formed as follows, and enforcing that for a true solution, this vector should be equal to 0. This allows the unknowns to be solved via different optimization schemes, e.g., least-square for linear problems [39] and iterative least-squares for non-linear problems [40]. When solving inverse problems for parameter estimation, the iterative least-square method is required. Thus, the estimation of the unknowns is updated at each iteration as follows, where the k subscript refers to the current iteration. In general, the ∆Ξ k term can be defined by performing classic linear least-square at each iteration of the iterative least-square procedure as follows, where J is the Jacobian matrix containing the derivatives of the losses with respect to all the unknowns. One can consider computing the Jacobian either by hand or by means of computing tools, such as Symbolic or Automatic Differentiation Toolboxes. The iterative process is repeated until either of the following conditions are met, where defines some user prescribed tolerance.
In Figure 1, a schematic that summarizes how the X-TFC algorithm works for solving inverse problems is shown. The main steps are also reported here: 1.
Approximate the latent solution(s) with the CE; 2.
Expand with the single layer NN (trained via ELM); 4.
Substitute into the DE (that can be also a system of DEs); 5.
Build the DE losses (that drive the training of the network, informing it with the physics of the problem); 6.
Build the data losses (the data can be provided on the solutions and/or on their derivatives); 7.
Train the network; 8.
Build the approximate solution (with the estimated optimal parameters).

Constrained Expression Derivation
Since this paper focuses on IVPs, for the convenience of the reader, we will present the step-by-step derivation of the constrained expression for these kinds of problems. The interested reader can find the general derivation for a n + 1 dimensional constrained expression either in [37] or [23]. Given a parametric ODE where we have a constraint on the initial value of the solution (i.e., f (0) = f 0 ), the constrained expression for f is the following [25], By imposing the constraint f 0 into Equation (12) we get the following, Now by plugging this results back into Equation (12) we get, where Ω 1 is called switching function. For an IVP with one constraint on f , we have which can be inconsistent with the domain where the activation function are defined. Thus we need to map it into the z ∈ [z 0 , z f ] domain as follows, where the mapping coefficient c is, According to the chain rule of the derivative we then have,

Epidemiological Models Formulation
In this section, the X-TFC formulation for the data-driven parameters discovery of a series of epidemiological compartmental models is explained in detail. The presented models are the SIR [55], SEIR [56], and SEIRS [57], taking into account the vital dynamics and the vaccination (for the SEIR model) [58]. As already mentioned, the goal is to estimate the parameters of our interest through solving inverse problems via a deterministic approach.
Given fixed parameters, by integration, we solve the systems of ODEs to create a synthetic dataset (with and without noise), through which the parameters that govern the physics of the problem can be retrieved. After building the constrained expressions and the loss functions, the Jacobian matrix (the matrix containing the derivatives of the losses with respect to the unknowns) is computed in order to perform the iterative least-squares and estimate the unknowns.

SIR Model
As first problem, we consider the system of differential equations that govern the classic deterministic SIR (Susceptible-Infectious-Recovered) compartmental model, in which individuals in the recovered state gain total immunity to the pathogen, with vital dynamics to take into account the births (that can provide an increase in susceptible individuals) and natural death rates. The DEs governing the SIR model are the following, where N = S + I + R is the total population, µ is the birth and natural death rate (considered equal to maintain a constant population), β is the infectious rate, and γ is the recovery rate. An important parameter to consider is the basic reproduction number R 0 , which represents the ratio between β and γ. If R 0 > 1, an outbreak is going to occur. According to the TFC framework, the latent solutions are approximated with the constrained expressions. That is, The first three loss functions we present take into account the regression over the data. The last three losses drive the training of the NN, informing it with the physics governing the problem. The Loss functions are reported below, To construct the Jacobian matrix J we need to compute the derivatives of the losses with respect to the ξ to compute the approximate solutions of the state variables, whereas the other derivatives are essential to estimate the parameters (in this case β and γ) appearing in the system of Equation (18). The resultant Jacobian matrix has the following form,

SEIR Model
The second problem that we aim to solve is the SEIR (Susceptible-Exposed-Infectious-Recovered) compartmental model. This model, compared to the previous one, takes into account the incubation period of a virus, i.e., the time in which a subject comes into contact with the virus but still does not develop its symptoms. Therefore, the subject is infected but is not yet considered among the infectious. In addition, a vaccination parameter which moves people from the Susceptible to Recovered directly is added. The following is the ODEs system describing the model, where N = S + E + I + R is the total population, µ is the birth and natural death rate (considered equal to maintain a constant population), ν is the vaccination rate, β is the infectious rate, φ is the rate at which an Exposed person becomes Infectious, and γ is the recovery rate. According to the TFC framework, the latent solutions are approximated with the constrained expressions. That is, The first four loss functions we present take into account the regression over the data. The last four losses drive the NN, informing it with the physics governing the problem. The loss functions are reported below, To construct the Jacobian matrix J we need to compute the derivatives of the losses in respect of the ξ to compute the approximate solutions of the state variables, whereas the other derivatives are essential to estimate the parameters (in this case β, γ, and φ) appearing in the system of Equation (18). The resultant Jacobian matrix has the following form,

SEIRS Model
The last problem we present here, is the SEIRS (Susceptible-Exposed-Infectious-Recovered-Susceptible) compartmental model. This model is used in the case when the immunity of recovered individuals wane, and they return to exist in the category of Susceptibles. No vaccination is considered here. This model is governed by the following system of ODEs: where N = S + E + I + R is the total population, µ is the natural deaths rate, ν is the new births rate, β is the infectious rate, φ is the rate at which an Exposed person becomes Infectious, ζ is the rate which Recovered individuals return to the Susceptible statue due to loss of immunity, and γ is the recovery rate. According to the TFC framework, the latent solutions are approximated with the constrained expressions. That is, The first four loss functions we present take into account the regression over the data. The last four losses drive the NN informing it with the physics governing the problem. The loss functions are reported below, To construct the Jacobian matrix J we need to compute the derivatives of the losses in respect of the ξ to compute the approximate solutions of the state variables, whereas the other derivatives are essential to estimates the parameters (in this case β, γ, and φ) appearing in the system of Equation (18). The resultant Jacobian matrix has the following form:

Results and Discussion
To test the ability of the X-TFC in performing data-driven parameters discovery of epidemiological compartmental models, we have created synthetic datasets according to the three models presented above (SIR, SEIR, and SEIRS). In particular, for each model, a no-noisy synthetic dataset (here called original datasetf orig ) has been generated by simply propagating the dynamics equations of the model using the MatLab function ODE113. In addition, to simulate a more realistic example, perturbed synthetic datasets (f pert ) have been created by adding noise to the original dataset. That is, where δ is the perturbation coefficient (equal to 0 for the original dataset) and U (·, ·) represents a uniform distribution. The real values of the parameters governing the synthetic dataset are known, so that the accuracy of the results is measured by the absolute error between the real and estimated values of the parameters. Additionally, the X-TFC method involves several hyperparameters that can be modified to obtain accurate solutions. These hyperparameters are the number of training points, n, the number of neurons, L, the type of activation function, and the probability distribution where input weights and bias are sampled from. Therefore, a sensitivity analysis has been performed to study the behavior of the X-TFC method as these hyperparameters vary. The sensitivity analysis is only shown for the SIR model with no-noisy data, as a similar behavior has been encountered for all the other models considered. First of all, the sensitivity analysis has demonstrated that, for the models analyzed, the solution accuracy is not as sensitive to the type of activation function used or to the probability distribution used to sample the inputs weights and bias as it is to the number of training points and the number of neurons, confirming the results found from the sensitivity analysis reported in [23]. Hence, the two parameters that strongly influence the performances of the X-TFC are n and L. Figure 2a,b refer to the analysis with the original dataset (δ = 0). As illustrated, high values of L (L > 150), with a fixed n, do not lead to an improvement of the accuracy, since Figure 2a presents an asymptotic-like behavior. The same considerations are valid varying n and keeping fixed L (Figure 2b). Indeed, the solution does not significantly improve increasing the number of discretization points. This result is also obtained if a perturbed dataset is considered (see Figure 2d). On the other hand, Figure 2c shows an interesting behavior. The accuracy of the solution gets worse by increasing the number of neurons L. This trend is probably due to the fact that X-TFC tries to overfit the perturbed data so that to diverge too much from the real curves and thus obtaining an inaccurate estimation of the parameters. The rest of this section focuses on the results obtained for each model presented previously. For these problems, the ArcTan activation function and a uniform random distribution ranging within [-10,10] are employed for the ELM.
All the models tackled in this manuscript have been coded in MATLAB R2020a and ran with an Intel Core i7 -9700 CPU PC with 64 GB of RAM.

SIR Model
Here, the results and the performances for SIR problem are shown. The outputs are obtained by setting the following parameters: • Natural mortality rate: µ = 0.1 (set equal to the birth rate, to simulate a constant number of population); • Effective contact rate (possibility to be infected): β = 1 2 ; • Removal rate (how often infected people become recovered): γ = 1 3 ; • Initial conditions: S 0 = 100; I 0 = 5; R 0 = 0; • Analysis time: 15 days.
Several simulations are carried out by varying the intensity of the noise, and the outputs are reported in Table 1. While we could find the exact values of parameters with the original dataset, a slight deviation of these values occurs by increasing the perturbation coefficient δ. However, the absolute errors on the parameters result to have at least two digits of accuracy. Figure 3a,b report the perturbed and real dataset and the solution of the problem for the case of δ = 5, respectively. As it can be seen, the X-TFC is able to obtain an accurate solution avoiding the overfitting on the data, as it could be expected by a simple regression on the perturbed dataset. This is due to the information about the physics of the problem, which acts as a regulator, that are embedded in the physics-informed training framework. The accuracy of the inversion with perturbed dataset is also proved by the constant value of the population N, as it has to be from the theory. Table 1. Performances of the proposed physics-informed framework in the data-driven discovery of the SIR model with different noise on the data, with n = 100 and L = 50.

SEIR Model
Here, the results and the performances for SEIR problem are shown. The outputs were obtained by setting the following parameters: • Natural mortality rate µ = 0.5 (set equal to the birth rate, to simulate a constant number of population); • Vaccine rate ν = 0.5; • Effective contact rate (possibility to be infected) β = 0.3; • Removal rate (how often infected people become recovered) γ = 0.6; • Progression rate from exposed to infected φ = 0.9; • Initial conditions: S 0 = 70; E 0 = 30; I 0 = 10; R 0 = 0; • Days = 15.
Several simulations are carried out by varying the intensity of the noise, and the outputs are reported in Table 2. While we could find the exact values of parameters with the original dataset, a slight deviation of these values occurs by increasing the perturbation coefficient δ. However, the absolute errors on the parameters result to have at least one digit of accuracy. Figure 4a,b report the perturbed and real dataset and the solution of the problem for the case of δ = 3, respectively. Again, the X-TFC is able to obtain an accurate solution avoiding the overfitting on the data, as it could be expected by a simple regression on the perturbed dataset. Table 2. Performances of the proposed physics-informed framework in the data-driven discovery of the SEIR model with different noise on the data, with n = 100 and L = 80, µ = ν = 0.5 (the training time is on the order of milliseconds).

SEIRS Model
Here, the results and the performances for SEIRS problem are shown. The outputs were obtained by setting the following parameters: • Natural mortality rate µ = 0.5 (set equal to the birth rate, to simulate a constant number of population); • Effective contact rate (possibility to be infected) β = 0.3; • Removal rate (how often infected people become recovered) γ = 0.6; • Progression rate from exposed to infected φ = 0.9; • Rate which recovered individuals return to the susceptible statue (due to loss of immunity) ζ = 0.5; • initial conditions: S 0 = 70; E 0 = 30; I 0 = 10; R 0 = 0; • days = 15.
Several simulations are carried out by varying the intensity of the noise, and the outputs are reported in Table 3. While we could find the exact values of parameters with the original dataset, a slight deviation of these values occurs by increasing the perturbation coefficient δ. However, the absolute errors on the parameters result in having at least two digits of accuracy. Figure 5a,b report the perturbed and real dataset and the solution of the problem for the case of δ = 3, respectively. Again, the X-TFC is able to obtain an accurate solution avoiding the overfitting on the data, as it could be expected by a simple regression on the perturbed dataset. Table 3. Performances of the proposed physics-informed framework in the data-driven discovery of the SEIRS model with different noise on the data, with n = 100 and L = 80, µ = 0.5 (the training time is on the order of milliseconds).

Conclusions
In this work, the new PINN framework X-TFC has been employed to solve data driven discovery of DEs, also called inverse problems, via a deterministic approach. In particular, compartmental epidemiological models (SIR, SEIR, SEIRS) have been taken into account as test problems. The goal was to retrieve the parameters governing the dynamics equations considering unperturbed and perturbed data, to better simulate the reality. The tests have shown fairly accurate results even when a significant noise was added to the data. Furthermore, the information about the physics of the problem (considered for the training of the X-TFC) has allowed to avoid the overfitting and thus to obtain good estimations of parameters with noisy data. The low computational times obtained are extremely important to process data as soon as they are acquired, so that the results can be updated in real time. Moreover, the good estimations of parameters allow one to make predictions about the imminent future: this makes it possible to take actions in the short term (as it should be in emergency scenarios, like the COVID-19 pandemic). Future works involve the inversion of models with non-constant parameters (i.e., parameters that follow mathematical laws) as well as probabilistic parameters estimation (via Bayesian inversion) in different research fields, such as business, biology, space, and nuclear engineering.

Conflicts of Interest:
The authors declare no conflict of interest.