Mathematical Modeling of Immune Responses against SARS-CoV-2 Using an Ensemble Kalman Filter

: In this paper, a mathematical model was developed to simulate SARS-CoV-2 dynamics in infected patients. The model considers both the innate and adaptive immune responses and consists of healthy cells, infected cells, viral load, cytokines, natural killer cells, cytotoxic T-lymphocytes, B-lymphocytes, plasma cells, and antibody levels. First, a mathematical analysis was performed to discuss the model’s equilibrium points and compute the basic reproduction number. The accuracy of such mathematical models may be affected by many sources of uncertainties due to the incomplete representation of the biological process and poorly known parameters. This may strongly limit their performance and prediction skills. A state-of-the-art data assimilation technique, the ensemble Kalman ﬁlter (EnKF), was then used to enhance the model’s behavior by incorporating available data to determine the best possible estimate of the model’s state and parameters. The proposed assimilation system was applied on the real viral load datasets of six COVID-19 patients. The results demonstrate the efﬁciency of the proposed assimilation system in improving the model predictions by up to 40%. We also performed a quantitative analysis using the RAE comparing the measured and predicted viral load, and reported the percentage of improvements achieved by the joint-EnKF. The joint-EnKF consistently achieved better results for all patients, leading to an approximately 40% improvement over the model free-run. This study demonstrated the relevance of the proposed assimilation system for enhancing the estimates of the immune response mathematical model.


Introduction
Since being first identified in the late fall of 2019 , the outbreak of coronavirus disease  has received increasing attention among the scientists all over the world in order to understand the epidemiology of the disease and curb the spread of the virus. COVID-19 is a contagious disease caused by the current severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Infected people will experience mild to moderate symptoms such as a cough, headache, fever, and in some cases, severe pneumonia [1,2]. The development of a vaccine to end the disease has undergone significant progress during the last two years. However, the virus is still going on, as the mortality rate and daily confirmed cases remain high in many countries. As efforts to eradicate the disease continue, in addition to biological and medical research, mathematical models have been suggested to play a vital role in describing and forecasting disease transmission, as well as help to implement efficient measures and strategies to control the spread of the pandemic and reduce its impact [3].
Several models have recently been proposed and implemented in order to understand the dynamics of COVID-19. Most of these models are based on the traditional epidemiological model, called the susceptible-exposed-infectious-recovered (SEIR) model, which was introduced to study the pandemic transmission within a human community [4][5][6][7][8][9][10][11][12][13][14]. To date, very few attempts have been devoted to describe the interaction between the immune system and SARS-CoV-2 [15][16][17][18]. An important limitation of these models is the presence of several sources of uncertainties, which may strongly degrade their prediction performance and lead to large inaccuracies in their outputs. Aside from the incomplete description of the biological processes, the poor characterization of the model parameters is also considered a major source of uncertainty. Therefore, it is crucial to quantify and reduce the model's uncertainties before using them for decision making.
Data assimilation techniques provide efficient tools for quantifying and reducing the uncertainties of such models [19,20]. They constrain prior information about the state and parameters of the underlying dynamical system with available data to compute posterior updated estimates of the parameters and state variables [21], based upon which improved predictions can be obtained with the model. This process, which finds its roots in the atmospheric and oceanic weather communities, sequentially nudges model predictions towards incoming observations, so the model trajectory remains as close as possible to the true trajectory [22]. The ensemble Kalman filter (EnKF) is an efficient data assimilation technique that has been successfully implemented in many fields including atmospheric and oceanic weather forecasting [23][24][25][26][27], hydrology and reservoir applications [28][29][30][31], and flood modeling [32,33]. EnKF is a Monte Carlo implementation of the classical linear Kalman filter (KF) [34], operating with cycles of forecast and update steps. In the forecast step, a set of members sampled from the state probability distribution were integrated forward in time with the model. In the update step, a Gaussian Kalman update is applied, once a new datum becomes available [35], from which the next assimilation cycle is initiated [22,35]. Unlike the KF, the EnKF can efficiently handle model nonlinearities without the need to compute tangent linear approximations of the model as for the extended Kalman filter.
In this work, we propose a mathematical model for the innate and adaptive immune responses against SARS-CoV-2 infection. First, we conducted a mathematical analysis to derive the basic reproduction number and discuss the equilibrium states of the model. To improve the model prediction skill, an efficient state-parameters estimation framework based on a joint-EnKF assimilation scheme was developed. By considering uncertainties in the model state variables and poorly known parameters, the proposed ensemble data assimilation system updates both state and parameters ensembles every time a new observation become available. This methodology was validated with real SARS-CoV-2 viral load data obtained from a hospital in Munich. We assessed the sensitivity of the system to the ensemble size and particularly focused on the relevance of the joint-EnKF for improving the prediction capability of the immune response mathematical model.
The rest of the study is organized as follows. Section 2 presents the mathematical model and briefly outlines the joint-EnKF algorithm. Section 3 is dedicated to the mathematical analysis of the model. The numerical results are presented and discussed in Section 4. Key findings and conclusions are summarized in Section 5.

Mathematical Model
The mathematical model describes the interaction dynamics of both the human innate and adaptive immune system with SARS-CoV-2 as follows. Healthy cells H(t) become infected at a rate β. The virus V(t) is produced by infected cells at a rate k 1 . The innate immune system instantly responds after the viral infection. Cytokines C(t) are a crucial element of the innate system and play a major role in inhibiting the viral replication at a rate p 2 . Specific cytokines activate natural killer cells N(t) at a rate r, which play an essential role in eliminating infected cells at a rate p 5 . At the same time, cytokines activate the adaptive immune system, particularly the cytotoxic T-lymphocytes T(t) at a rate λ 1 . T-cells kill virus infected cells at a rate p 1 and afterwards activate B-lymphocytes B(t) at a rate λ 2 . A ratio k of B-cells differentiate into plasma cells in an antigen-dependent manner to produce antibodies at a rate η. They specifically recognize and neutralize the virus at a rate p 3 . A schematic diagram characterizing the proposed model is presented in Figure 1. Finally, the system describing these interactions is expressed as a set of ordinary differential equations: where G is the Heaviside step function defined as follows: where the time delay τ is the time required to produce antibodies from plasma cells. Detailed descriptions and definitions of all the model state and parameters are given in Table 1.

Ensemble Kalman Filter (EnKF)
Consider the following space-time dynamical system: where the model's prognostic state is denoted by x k , M is the nonlinear predictive model and θ denotes a set of modeling parameters. The observations at time t k+1 are given by y k+1 . The state variables are mapped into the observation space using the H operator.
Model errors η k+1 are assumed Gaussian, centered at zero with a prescribed covariance Q k+1 . Observation errors k+1 , on the other hand, are assumed to be uncorrelated with the model errors and further follow N (0, R k+1 ).
During the forecast step, the model is used to integrate an ensemble of states forward in time. The prior sample mean and covariance are then approximated aŝ (5) where N e is the ensemble size. By augmenting the state and parameters-z k+1 = x T k+1 , θ T -the joint-EnKF uses y k+1 to update the prior joint state as follows: such thatM is a modified nonlinear function operating on the augmented state-parameters vector. Using the ensemble members of both the state and parameters, the joint prior covariance can be formed aŝ where X θ represents the departure of the parameter ensemble from the mean, i.e., ensemble anomalies. A linear Kalman update is applied to compute the analysis state and parameter ensembles: where

Mathematical Analysis of the Immune Response Model
This section discusses the equilibrium states, the stability of the disease-free equilibrium, and the basic reproduction number of the immune response model (1).

Disease-Free Equilibrium and Basic Reproduction Number of the Model
Model (1) has a unique disease-free equilibrium X 0 obtained by setting all the righthand side to zero with V = 0 as follows: The basic reproduction number R 0 represents the average number of secondary infected cells produced by a single infected cell in the infection-free environment. We calculated it using the next-generation method [42]. Let F and W denote the matrix of the new infection and the transmission matrix, respectively. At the disease-free equilibrium X 0 , F and W are given by Following [42], the basic reproduction number R 0 is the dominant eigenvalue of the next-generation matrix FW −1 , so that:

Analysis of the Steady States
Model (1) has four steady states: 1. The disease-free equilibrium X 0 . 2. The virus persistence equilibrium in the absence of immune responses given by where: We can clearly see that the equilibrium X 1 exists when R 0 > 1. 3. The virus persistence equilibrium in the absence of the adaptive immune response given by where: and C 2 is the root of the quadratic equation: with: It is clear that a 2 is always positive, therefore, Equation (16) has a unique positive solution when a 0 < 0. Thus, R 0 > 1 ensures the existence of X 2 . 4. The all cells and virus co-existence equilibrium is given by where: From the first equation of system (1), one has: and from the third one: Putting the expressions of H 3 and V 3 into the second equation of system (1), we obtain the following expression for A 3 : where: Finally, the last two equations of system (1) give: Since X 3 should be positive, one can see that the equilibrium X 3 exists if R 0 > R 1 + 1.

Theorem 1.
The disease-free equilibrium X 0 is locally asymptotically stable when R 0 < 1, and unstable when R 0 > 1.

Results and Discussion
The aforementioned model is applied to forecast the SARS-CoV-2 viral load. Real data including daily viral load measurements from six patients in a hospital in Munich (Germany) are obtained from [44]. The assimilation impact is assessed by comparing the data assimilation output obtained using the joint-EnKF to those of the model free-run without assimilation.

Assimilation Settings
The state vector (x) to be assimilated includes nine variables: the number of healthy cells H, infected cells I, viral load V, cytokines C, natural killer cells N, cytotoxic Tlymphocytes T, B-lymphocytes B, plasma cells P and antibody levels A: and the parameters vector (θ) contains all model parameters listed in Table 1, meaning that: In the joint-EnK, the state and parameters are simultaneously estimated. Thus, the augmented state vector becomes: To prevent negative input, all parameters are taken in a logarithmic form during the estimation procedure. The total number of estimated variables by the joint state is 33.
The initial state and parameter values are outlined in Table 1 with p 2 = 0.001, p 3 = 0.05, k 1 = 379, k 2 = 5, η = 0.05, µ 2 = 0.65, µ 3 = 0.9, and τ = 7. To initial-ize the simulation, the state ensemble at time zero is drawn from a truncated Gaussian distribution using the initial state x 0 as: A preprocessing step is performed to ensure all initial state variables are non-negative. The standard deviation of the initial ensemble distribution is chosen as κx 0 , where κ = 10% and σ is a standard normal distribution. Likewise, the parameter ensemble is initialized starting from the nominal parameters of Table 1 in which a log-normal distribution, with a 10% standard deviation, is used to draw the parameter samples. A larger initial uncertainty range is assigned to parameter β where the standard deviation is set to 40%. To prescribe the uncertainty in the data, the observation error covariance at any time t k is assumed diagonal (i.e., observations error correlations in space vanish) with variances set to 5% of the data values.
The prediction skill of the data assimilation framework is evaluated using the following relative absolute error (RAE): where y o (k) denotes the k th value of the observed data, y f (k) is the associated forecast mean and K is the sample size of the observed data at that time.

Sensitivity to Ensemble Size
Initially, we investigated the EnKF sensitivity to the ensemble size N e , which is a key feature of the EnKF. Running the filter using large ensemble sizes is undoubtedly more costly. Therefore, computing reliable state and parameter estimation using small ensembles is often desired. We run the assimilation system using six ensembles with different sizes varying from N e = 25 to 1000, and report the average RAE of the forecasted viral load in Figure 2. As expected, increasing the ensemble size improves the performance of the filter. As shown in Figure 2, the use of 100 ensemble members leads to a nearly 35% decrease in the RAE compared to using 25 members. However, increasing the ensemble size beyond 100 only has a marginal effect on the accuracy (roughly 6%) relative to the increment in the runtime cost. Therefore, we run the rest of our simulations with 100 members.

Assimilation Results
The time series of the viral load from the data, free-run (without data assimilation) and joint-EnKF run for all patients are shown in Figure 3. Compared to the model free-run, the joint-EnKF suggests significant improvements and yields more accurate viral load estimates. Although a few data points are not particularly well estimated by the joint-EnKF, the overall behavior of the viral load is well reproduced by the filter. To allow a quantifiable validation, Table 2 outlines the bias values comparing the measured and predicted viral load of the model free-run and the joint-EnKF for the six patients. The joint-EnKF results are also presented in terms of percentage improvement with respect to the model free-run prediction. The significantly improved filter estimates demonstrate the relevance of the data assimilation of the viral load information into the immune response model.  Closer examination of the results suggests that the viral load measurements for Patient 1 WEre very well predicted by the joint-EnKF. In particular, the viral load peak and the duration of high viral load WEre well captured. The joint-EnKF provides lower RAE compared to the model free-run, with an improvement reaching up to 50%. The model underestimates the peak level and does not accurately predict the rest of the hydrograph. For patients 2, 4 and 5, the viral load was satisfactory predicted by both the joint-EnKF and the model free-run, although the later fails to reproduce the peak level at the end of the Patient 5 hydrograph. The joint-EnKF further improves the prediction of the viral load for those patients by approximately 33%, 20% and 42%, respectively. Both the joint-EnKF and the model free-run overestimate the peak level for Patient 3. While the joint-EnKF accurately predicts the rest of the hydrograph, the model free-run still overestimates the duration of the high viral load. The joint-EnKF is able to improve the viral load prediction by almost 63%. The same conclusion can be drawn for Patient 6, for whom the duration of the high viral load was overpredicted by the model free-run while the rest of the hydrograph is underestimated.
Overall, the performance of the model with and without data assimilation can be summarized as follows: (1) the peak level is well predicted almost everywhere by the filter, while it is occasionally overestimated or underestimated by the model free-run; (2) the model free-run solution is smooth compared to the joint-EnKF, which delineates the viral load dynamic much more accurately; and (3) the joint-EnKF is able to enhance the viral load estimation for all patients, improving the model solution by up to 40% on average. Figure 4 displays the time evolution of selected model parameters as estimated by the joint-EnKF for Patient 1. As seen, all parameters smoothly converge to a biologically acceptable range of values. The initial ensemble shrinks at each iteration showing a reduction of 95% in the standard deviation at the end of the assimilation process. A similar assessment was made for the other parameters. Table 3 presents a summary of the selected parameters estimated by the joint-EnKF for all patients. The time delay for antibody production τ of patients 1, 2 and 5 was approximately 7 days and significantly lower than that for patients 3, 4 and 6 with values of 9.52, 11.96 and 10.17 days, respectively. These results are quite consistent with the viral load hydrograph in Figure 3. As we can see, the viral load of patients 1, 2 and 5 start to decrease after almost 7 days. However, the decrease in the viral load for the other patients takes more time.

Conclusions
This paper formulated a nine-compartment mathematical model for simulating SARS-CoV-2 dynamics and the immune system response in infected patients. The model consists of healthy cells, infected cells, viral load, cytokines, natural killer cells, cytotoxic T-lymphocytes, B-lymphocytes, plasma cells and antibody levels. We computed the basic reproduction number and discussed the equilibrium states of the model. In order to reduce uncertainties in the model and improve its prediction skill, a joint state-parameter estimation scheme based on the EnKF was implemented. The joint-EnKF uses current viral load observations and applies a Kalman-based update to the dynamic state variables and parameters. Finally, we demonstrated the efficiency of the proposed methodology with real SARS-CoV-2 viral load data for six patients, while comparing its performance with the model free-run (i.e., without assimilation).
Sensitivity experiments using different ensemble sizes with the EnKF suggested that 100 members are sufficient to obtain reliable estimates with reasonable computational cost. During assimilation, the parameters' ensembles were constrained by the data where the spread was reduced and eventually converged to a biologically acceptable range of values. In addition, the proposed joint-EnKF was able to reproduce the overall evolution of the dynamic process of the viral load evolution, the peak values, and the duration of the high viral load with significantly better accuracy than the model free-run. We also performed a quantitative analysis using the RAE comparing the measured and predicted viral load, and reported the percentage of improvements achieved by the joint-EnKF. The joint-EnKF consistently achieved better results for all patients, leading to an approximately 40% improvement over the model free-run. This study demonstrated the relevance of the proposed assimilation system for enhancing the estimates of the immune response mathematical model.