Advances in Parameter Estimation and Learning from Data for Mathematical Models of Hepatitis C Viral Kinetics

Mathematical models, some of which incorporate both intracellular and extracellular hepatitis C viral kinetics, have been advanced in recent years for studying HCV–host dynamics, antivirals mode of action, and their efficacy. The standard ordinary differential equation (ODE) hepatitis C virus (HCV) kinetic model keeps track of uninfected cells, infected cells, and free virus. In multiscale models, a fourth partial differential equation (PDE) accounts for the intracellular viral RNA (vRNA) kinetics in an infected cell. The PDE multiscale model is substantially more difficult to solve compared to the standard ODE model, with governing differential equations that are stiff. In previous contributions, we developed and implemented stable and efficient numerical methods for the multiscale model for both the solution of the model equations and parameter estimation. In this contribution, we perform sensitivity analysis on model parameters to gain insight into important properties and to ensure our numerical methods can be safely used for HCV viral dynamic simulations. Furthermore, we generate in-silico patients using the multiscale models to perform machine learning from the data, which enables us to remove HCV measurements on certain days and still be able to estimate meaningful observations with a sufficiently small error.


Introduction
Viral dynamic mathematical models are useful in our understanding of the in vivo kinetics of a variety of viruses that trigger both persistent infection (e.g., HIV-1 [1][2][3][4][5], HBV [6][7][8], HDV [9][10][11][12], Theiler's encephalomyelitis virus [13], herpes [14] and HCV [15][16][17]) and acute infection (e.g., SARS-CoV-2, influenza A [18] and ebola [19,20]).Mathematical models are also improving our understanding of intracellular viral genome dynamics.The standard model for HCV kinetics during treatment [15] provided helpful insights into the effectiveness and the mechanism of action of interferon-alpha (IFN) as well as ribavirin.The standard model has been applied both retrospectively [21,22] and prospectively, i.e., on treatment [23] to predict the duration of IFN-free antiviral treatment with direct-acting agents (DAAs) [24] needed for HCV eradication (termed time-to-cure).New models were developed to meet the challenges of therapy such as drug resistance [25] and cure after an ultrashort duration of therapy [26].Notably, age-based multiscale mathematical models for HCV dynamics provided a comprehensive understanding of the nature of viral dynamics patterns observed in patients that were treated with IFN and direct-acting agents such as the HCV protease inhibitors [27], or the HCV NS5A inhibitor daclatasvir [28,29].
The standard model [15] is a set of three ordinary differential equations (ODEs) with three variables: the uninfected target cells (T), the productively infected cells (I), and the free virus in blood (V).Age-structured multiscale models [27,28,30] consider the intercellular viral RNA (vRNA) in an additional equation for the RNA variable (R), by introducing age-dependency in addition to time-dependency, bringing it to a partial differential equation (PDE) model.
We provided a stable and efficient numerical method for multiscale models [31].Unlike the use of numerical methods in some other applications, for example, in nonlinear diffusion in image processing [32,33] where the accuracy can be limited, here it was advisable to use a stable and efficient scheme belonging to the Runge-Kutta family of numerical schemes with at least a fourth-order of accuracy.Moreover, due to the properties of the model differential equations that are stiff and also an additional integral that is required to be evaluated at every time step, implicit numerical solvers with an adaptive stepsize are significantly more efficient and more stable than the fourth-order Runge-Kutta.Thus, we have chosen to implement an implicit scheme with an adaptive stepsize [34] that is highly efficient and, at the same time, stable for use in multiscale models with the age of HCV dynamics.Furthermore, we provided a suitable method for our numerical work for parameter estimation in [35].To complement the work in [35], we also wish to address the sensitivity of the parameters in HCV mathematical models by performing a rigorous sensitivity analysis.Finally, we recently used machine learning methods to predict the time for virus elimination from a patient's extracellular fluid (termed time-to-cure) under DAA therapy in in-silico patients, which were generated by the biphasic model and assessed the accuracy of the predictions [36].In the current study, we aimed to generate in silico patients by the multiscale model (that predicts more than two phases of viral decline under therapy) and use machine learning methods to estimate time-to-cure and compare them with the predictions of the biphasic model.The multiscale model incorporates possible drug effects on the HCV intracellular lifecycle that the biphasic model does not, namely blocking assembly/secretion, vRNA replication or enhancing vRNA degradation, with the motivation to better understand and simulate the mechanisms that govern the early virologic response to treatment.Being a more sophisticated model, the parameters are more difficult to estimate with the multiscale model relative to the biphasic model [35].In the following sections, we briefly describe both the biphasic and multiscale models along with their sensitivity analysis and their time-to-cure predictions.We start with the descriptions of the models, continue to their sensitivity analysis, and then we perform learning from data to predict time-to-cure, after which we follow with a discussion of the results that were obtained.

The Biphasic Model
The standard biphasic model for HCV kinetics during treatment [15] is described first.The three variables that this model keeps track of are target cells T, in Equation ( 1), infected cells I in Equation ( 2) and free virus V in Equation ( 3).The target cells T are produced at a constant rate s, die at a per capita rate d and are infected by the virus V at a constant rate β.The infected cells I grow with new infections at a rate βV(t)T(t) and die at a constant rate δ.The virus V is produced at a rate p by each infected cell.It is cleared at a constant rate c.The ϵ term represents the effectiveness of the treatment, which decreases the production from p to (1 − ϵ)p.The system of ODEs for the biphasic model can be written as: This ODE model is mathematically much simpler as compared to the PDE multiscale model described next.Although the model is nonlinear, it can be solved analytically if we assume that T is constant (i.e., target cells remain constant during antiviral therapy) [15].The stability analysis of the model was addressed in [37].

The Multiscale Model
The multiscale model of HCV kinetics [27,28,30] uses the standard biphasic model that considers uninfected target cells in the first equation (Equation ( 4)), infected cells in the second equation (Equation ( 5)), and free virus in the third equation (Equation ( 6)), while also considering vRNA (R) dynamics in an infected cell in the fourth equation (Equation (7).The first three equations comprise the standard biphasic model.The multiscale model is described by: R ˙(a) = − [ρ(a) + μ(a)]R(a) + α(a), (7) where in the fourth equation we have also used Newton's notation for the time derivative, even though the time variable in this equation is not t but the age, a.The four equations can be equivalently formulated in Leibniz's notation, for simplicity, as follows [28]: ∂I(a, t) ∂t + ∂I(a, t) ∂a = − δ(a)I(a, t) dV (t) dt subject to the auxiliary conditions I(0, t) = βV(t)T(t), I(a, 0) = I(a), R(0, t) = 1, and Model parameters of T, in Equation ( 8), and I, in Equation ( 9), are similar to the biphasic model.The time variable a is the age of infection and the time variable t is the time duration from therapy initiation.The quantity of free virus V that is shown in Equation (10) depends on the number of the assembled and released virions, as well as their clearance rate c.The quantity of intracellular vRNA R, Equation (11), depends on its synthesis α and degradation μ and its expulsion from the cell ρ.Antiviral treatment blocks the intracellular synthesis of R (parameter ϵ α ) and/or blocks R secretion (parameter ϵ s ), and/or enhances R degradation.

Sensitivity Analysis
To understand the robustness of simulation results obtained when applying the methods of [31,35], we performed a sensitivity analysis on the biphasic and multiscale models.The idea behind the analysis is as follows.We want to track how different parameters affect the outcome of our model, in this case, the number of free virions in the blood after two days.We will generate a large ensemble of well-distributed values over a certain range for each parameter and randomly match them to create a set of randomly initialized parameters.For each of those sets, we then compute our desired outcome.Instead of comparing the raw values, we will compare their rankings.These rank variations determine which parameters are the most influential.In both models, the analysis was performed on the number of free virions in the blood after two days.We followed the protocol described in [42] to evaluate the partial rank correlation coefficient (PRCC) of the different parameters.It is interesting to note that the partial rank correlation coefficient analysis is based on Pearson's correlation that is used to derive the histogram correlation [43] as employed in [44].Latin hypercube sampling was performed on 1000 samples, where each parameter was evaluated between half and twice its value.A negative PRCC indicates that an increase in the parameter decreases the number of free virions after two days, while a positive one indicates that it increases it.

Sensitivity Analysis in the Biphasic Model
For the biphasic model, three parameters are present: the clearance rate c, the infected-cell loss rate δ and the effectiveness of the blocking of viral production by the treatment ε.
Having as an initial value the parameter values that are depicted in Table 1, we performed uniform Latin hypercube sampling in the range 1/2 to 2 times those values.The results all had highly significant p-values, below 0.001 and are shown in Figure 1.As expected, they are all negatively correlated with the number of free virions.

Sensitivity Analysis in the Multiscale Model
There are twelve parameters in the multiscale model.The central values of each of the parameters are shown in Table 2.All parameters are significantly different from 0 with a p-value below 0.001.
We observe, as shown in Figure 2, that only four parameters are positively correlated.They are the rate of target cell production s, the new infection rate β, the virus assembly/ secretion rate ρ, and the intracellular viral RNA synthesis α.In contrast, the DAA inhibition parameters ε s and ε α , which model the decay of viral replication templates, are negatively correlated as expected.

Machine Learning
Parameter estimation uses data information to calibrate the model.The importance of the data can be further exploited by a machine learning approach that was recently presented in [36] in which only the biphasic model was simulated.To compare the utility of machine learning on the biphasic and multiscale models, we generated two datasets of 3000 in-silico patients based on assumed/estimated parameter space in [27,28].To create the biphasic dataset, the model that was described in [36] was used to generate random parameter values that are uniformly distributed while constrained to the prescribed parameter bounds provided in that reference.To create the multiscale dataset, the model that was described in [35] was used to generate random parameter values that are uniformly distributed while constrained to the following prescribed parameter bounds: β (0.00000005 mL/day/virion), δ (0.14/day), ρ (8.18/day), c (22.3/day), d (0.01/day), α (40.0 vRNA/day), μ (1.0/day), κ (3.5-6.7),γ (0.2-0.96/day), ϵ s (0.4-0.8), ϵ a (0.9-0.999), and s (4.62-6.48log(cells/mL)).For visualization purposes, the dynamics are depicted in Figure 3 generated using our in-house multiscale model HCV simulator [29] with example trajectories in which the parameter ϵ a was simulated with three parameter values: 0.97, 0.985, and 0.999.In Figure 4, we performed a one-way sensitivity analysis of the multiscale model in predicting time-to-cure.
In addition, we provide in Appendix A the figures demonstrating how each one of the parameters that were varied in the one-way sensitivity analysis influence time-to-cure.For each of the generated in silico patients in each dataset, the values for the virus amount during DAA therapy were generated on days 0, 1, 7, 14, 28 (the days at which serum HCV were measured in the proof of concept pilot study [24]).The time-to-cure (defined as one virus copy in the entire extracellular body fluid as used in [24]) was calculated for each patient using the biphasic and multiscale models.
Next, machine learning was applied in order to predict the time-to-cure of the patients in both datasets.This was performed using the freely distributed machine learning Weka software, which is freely available at https://waikato.github.io/weka-wiki/downloading_weka/, accessed on 11 April 2022.Several machine learning methods were tried, including linear regression, support vector machine, neural networks and randomforest.The different classifiers were tested using default parameters, and we found that neural networks provided the best results in all our experiments [45][46][47], which were sufficient for our needs, although it might be possible to reach even better error values if more sophisticated methods such as deep learning were tried.
For checking the efficiency of the machine learning method when applied to each in-silico dataset, we utilized a customary 66-34% framework for the train-test scheme.We trained with data of 66% of the patients (our 'train' set) in order to estimate the time-to-cure on untrained data of 34% of the patients (our 'test' set).It was faster than 10-fold validation but provided similar errors to when the data set is large.
If we include all HCV measurements (i.e., at 0, 1, 7, 14, and 28 days) in the biphasic and multiscale datasets and use the neural network machine learning regressor for classification, we obtain about 1.7% (biphasic) and 0.65% (multiscale) error on average in predicting the time-to-cure.In Figure 5, the distribution of relative errors obtained for predicting time-tocure is presented.Opting for an economized protocol in which fewer HCV measurements are used (as theoretically performed in [48]), we removed the days 1 and 7, while remaining only with the days 0, 14, 28.Using only these 3 days, we obtain the distribution of relative errors, as shown in Figure 6.To generate the multiscale results of Figures 5 and 6, it took around 3 days of computing time on a standard PC (Intel(R) Core(TM) i7-7829HQ CPU at 2.90 GHz with 32GB RAM).Furthermore, if we remove days 2 and 28 (not shown in the figures), corroborating [48] specifically, we obtain errors of 3.82% and 1.08%, respectively.It can be observed from these results that the error is about four times less when the multiscale model is used (additionally, one can check Figure 5), which is an even larger ratio than before the removal of days.It can be concluded that for scenarios in which we remove some of the days, the advantage of the multiscale model could reveal itself significantly.

Discussion
Two established hepatitis C viral dynamic models have been examined and compared to each other in terms of parameter estimation and learning from data.The first is the classical biphasic model [15], and the second is the multiscale model [27,28].Before patient data can assist in calibrating the model or estimating important quantities such as time-to-cure, it is      PRCC results of the multiscale model.For visualization purposes, the dynamics with example trajectories in which the parameter ϵ a was simulated with 3 parameter values 0.97, 0.985, and 0.999 is displayed using our in-house multiscale model HCV simulator [29].The other parameters were fixed as follows: β (0.00000005 mL/day/virion), δ (0.   Distribution of relative errors obtained for predicting time-to-cure using machine learning with the biphasic and multiscale models.Measurements of all time points (0, 2, 7, 14 and 28 days) were used for the machine learning.Distribution of relative errors obtained for predicting time-to-cure using machine learning with the biphasic and multiscale models.Measurements of reduced time points (0, 14 and 28 days) were used for the machine learning.
Central values of the biphasic model for the PRCC.The (min, max) of the parameter space are shown.Mathematics (Basel).Author manuscript; available in PMC 2022 October 13.

Figure A2 .
Figure A2.Influence of parameter γ in the multiscale model on time-to-cure.

Figure A3 .
Figure A3.Influence of parameter ϵ s in the multiscale model on time-to-cure.

Figure A4 .
Figure A4.Influence of parameter ϵ a in the multiscale model on time-to-cure.

Figure A5 .
Figure A5.Influence of parameter s in the multiscale model on time-to-cure.

Figure 1 .
Figure 1.Partial rank correlation coefficient (PRCC) results in the biphasic model.