Markov Chain-Based Stochastic Modelling of HIV-1 Life Cycle in a CD4 T Cell

: Replication of Human Immunodeﬁciency Virus type 1 (HIV) in infected CD4 + T cells represents a key driver of HIV infection. The HIV life cycle is characterised by the heterogeneity of infected cells with respect to multiplicity of infection and the variability in viral progeny. This heterogeneity can result from the phenotypic diversity of infected cells as well as from random effects and ﬂuctuations in the kinetics of biochemical reactions underlying the virus replication cycle. To quantify the contribution of stochastic effects to the variability of HIV life cycle kinetics, we propose a high-resolution mathematical model formulated as a Markov chain jump process. The model is applied to generate the statistical characteristics of the (i) cell infection multiplicity, (ii) cooperative nature of viral replication, and (iii) variability in virus secretion by phenotypically identical cells. We show that the infection with a ﬁxed number of viruses per CD4 + T cell leads to some heterogeneity of infected cells with respect to the number of integrated proviral genomes. The bottleneck factors in the virus production are identiﬁed, including the Gag-Pol proteins. Sensitivity analysis enables ranking of the model parameters with respect to the strength of their impact on the size of viral progeny. The ﬁrst three globally inﬂuential parameters are the transport of genomic mRNA to membrane, the tolerance of transcription activation to Tat-mediated regulation, and the degradation of free and mature virions. These can be considered as potential therapeutical targets.


Introduction
Infection with the Human Immunodeficiency Virus type-1 (HIV) remains a global problem of public health concern, with more than 70 million people infected since the early 1980s [1].Unfortunately, there are no efficient vaccines against HIV [2].The within-host infection dynamics is characterized by (i) a variability of the disease course in infected individuals [3], (ii) the ability of the virus to rapidly establish a large population of latently infected cells [4], (iii) an increasing genetic diversity of the virus quasi-species [5][6][7], (iv) heterogeneity of the infected cells with respect to the multiplicity of infection [8][9][10][11], and (v) the resulting variability in viral progeny [12].The multi-physics nature of the HIV infection calls for the development of computational models integrating the infectiondriven processes across the systemic physiological-, organ and tissue-, and single-cell levels of resolution.The description of within-cell virus replication is a cornerstone of the development of multiscale mathematical models of HIV infection [13,14].
The virus life cycle in permissive cells represents the primary driver of HIV infection.So far, a number of mathematical models have been developed to simulate the replication of HIV in an infected cell at different levels of details (see for a list [15]).Importantly, the highresolution models represent a deterministic description of the biochemical reaction steps underlying the HIV life cycle.However, many of the replication stages are characterized by low numbers of reactants, and hence there is a strong impact of random effects and fluctuations in the reaction kinetics [16].A systematic analysis of the contribution of stochastic processes to HIV-replication is still missing.
We have recently developed a detailed in silico model of the HIV life cycle in productively infected CD4 T cells [15] formulated with a system of 24 ordinary differential equations (ODE).This deterministic model enables estimation of biochemical parameters of HIV replication.The resulting mechanistic description of the HIV life cycle established a mechanistic basis for the development of a stochastic description of the process kinetics using the Monte Carlo framework.
In this paper, we formulate the stochastic Markov chain-type model of HIV replication in productively infected cells in the form of Gillespie's algorithm [17].We propose a modification of this algorithm that accelerates the computation process by a factor of 200 compared with the classical Gillespie's algorithm.This is essential, because to obtain the comprehensive statistical data, a very large number of realisations of the stochastic process should be computed.
We apply the stochastic model to examine the statistical characteristics of key aspects of single-cell infection: • Cell-to-cell variability in HIV progeny production; • Multiplicity of single-cell infection; • Global sensitivity of specific reaction steps on net virus production.
Our study delineates the contribution of random effects to (i) the multiplicity of infection, (ii) the cooperative nature of viral replication, and (iii) the variability in the scale of virus secretion by phenotypically identical cells.

Governing Deterministic Equations
The HIV-1 replication cycle presented in Figure 1 consists of multiple stages described below.The specific biochemical reactions and model parameters are from our previous work [15].

Virus Entry
The entry stage is split into three steps (see also [15]) represented in Figure 1: 1.
Virion binding to CD4 receptors (the viral glycoprotein gp120 binds to CD4 receptors on the T cell surface); 2.
Binding to the co-receptor CCR5 or CXCR4; 3.
Virus membrane and cell membrane fusion, i.e., the nucleocapsid is uncoated and the viral RNA is injected into the cell.
The above is described by the following equations: where: is the number of free virions outside the cell; is the number of virions bound to CD4 and the co-receptor.
The model parameters represent the rate of virion binding to the CD4+ T cell membrane, the rate of virion fusion with the cell, the clearance rate of free mature virions, and the degradation rate of bound virions, respectively.Their reference values and admissible ranges are specified in [15].

Reverse Transcription
Bound virions fuse with the host cell membrane.This step results in the release of the virion content into the cell cytoplasm setting up the stage of reverse transcription [18].The reverse transcription creates a double-stranded proviral DNA from two single-stranded RNA genomes.
The reverse transcription reaction consists of three sequential processes [15]: 1. Synthesis of minus-strand DNA from viral RNA; 2.
Double-strand DNA formation.
This stage is modelled by the following equations: where: is the number of genomic RNA molecules in the cytoplasm; x 4 = [DNA cor ] is the number of proviral DNA molecules synthesized by reverse transcription.The model parameters k RT = 0.43 h −1 ; k DNA t = 0.12 h −1 ; d RNA cor = 0.21 h −1 ; d DNA cor = 0.03 h −1 ; represent the reverse transcription rate, the transport rate of DNA from cytoplasm to nucleus, the degradation rate of RNA in the cytoplasm and the degradation rate of DNA in the cytoplasm, respectively.Their reference values and admissible ranges are specified in [15].

Integration
The synthesized proviral DNA associates with virus-encoded integrase (IN) and other proteins.This forms a high-molecular-weight nucleoprotein complex (pre-integration complex, PIC).The PIC is transported into the nucleus for subsequent integration of proviral DNA into the host cell chromosomal DNA [19].The change in the amount of proviral DNA in the nucleus and the amount of integrated DNA are modelled with the following equations [15]: where: x 5 = [DNA nuc ] is the number of DNA molecules in the nucleus; The model parameters k int = 0.14 h −1 ; d DNA nuc = 0.001 h −1 ; d DNA int = 0.00002 h −1 ; represent the integration rate, the degradation rate of DNA in the nucleus and the degradation rate of DNA integrated into the chromosome, respectively.Their reference values and admissible ranges are specified in [15].

Transcription
The transcription of HIV starts after the infected cell receives external activation signals.It results in the synthesis of three types of messenger RNA (mRNA), i.e., the fulllength (around 9 kb), singly-spliced (around 4 kb) and doubly-spliced (around 2 kb) [20].The transcription stage is followed by the transport of mRNAs to the cell cytoplasm.The viral Tat and Rev proteins regulate the transcription and mRNA distribution.These stages are described by the following equations considering the specific parameterizations of the feedback regulation similar to those presented in [21,22]:  (12) where: x 7 = [mRNA g ] is the number of HIV mRNA molecules in the nucleus: g for genomic or full-length; x 8 = [mRNA ss ] is the number of HIV singly spliced (ss) mRNA molecules in the nucleus; x 9 = [mRNA ds ] is the number of HIV doubly spliced (ds) mRNA molecules in the nucleus; x 10 = [mRNAc g ] is the number of HIV mRNA molecules in the cytoplasm: g for genomic or full-length; x 11 = [mRNAc ss ] is the number of HIV singly spliced (ss) mRNA molecules in the cytoplasm; x 12 = [mRNAc ds ] is the number of HIV doubly spliced (ds) mRNA molecules in the cytoplasm.
The model parameters TR cell = 15 h −1 ; TR Tat = 1500 h −1 ; θ Tat = 10 3 molec.;θ Rev = 7.7 × 10 4 molec.;β = 0.9; ds = 0.12 h −1 ; represent the cell intrinsic rate of basal transcription, the level of transcription induced by Tat transactivation, the inhibitory effect of Rev on the splicing rates implying their 1/(1 − β)-fold reduction at the saturation level of Rev, the rate of splicing for full-length virus RNA, the rate of [mRNA g ] export from the nucleus, the rate of [mRNA ss ] export from the nucleus, the rate of splicing for singly spliced virus RNA, the rate of [mRNA ds ] export from the nucleus, the transport rate of [mRNA g ] to the cell membrane, and the degradation rates of [mRNA i ], i ∈ {g, ss, ds}, respectively.Their reference values and admissible ranges are specified in [15].

Translation
Ribosomes function to translate the viral mRNA into specific proteins.The Gag and Gag-Pol proteins are coded by the full-length mRNA.The gp160, Vif, Vpu, and Vpr proteins are coded by singly spliced mRNAs.Finally, the Tat, Rev, and Nef proteins are coded by doubly spliced mRNAs.The synthesized proteins then fold into active proteins.In our deterministic model we consider the kinetics of the Gag-Pol, Gag, gp160, Tat, and Rev proteins [15].The following set of differential equations is used:  (17) where: x 13 = [P Gag-Pol ] is the number of protein molecules: Gag-Pol; x 14 = [P Gag ] is the number of protein molecules: Gag; x 15 = [P gp160 ] is the number of protein molecules: gp160; x 16 = [P Tat ] is the number of protein molecules: Tat; x 17 = [P Rev ] is the number of protein molecules: Rev. The model parameters k trans = 524 h −1 ; f g,Gag-Pol = 0.05; f g,Gag = 0.95; f ss,gp160 = 0.64; f ds,Tat = 0.025; f ds,Rev = 0.2; k tp,Gag-Pol = 2.8 h −1 ; k tp,Gag = 2.8 h −1 ; k tp,gp160 = 2.8 h −1 ; d p,Gag-Pol = 0.09 h −1 ; d p,Gag = 0.09 h −1 ; d p,gp160 = 0.02 h −1 ; d p,Tat = 0.04 h −1 ; d p,Rev = 0.07 h −1 ; represent the rate of mRNA to proteins translation, f ij stand for the fraction of [mRNA i ] coding [P j ], i ∈ {g, ss, ds}, j ∈{Gag-Pol, Gag, gp160, Tat, Rev}.Following them, the parameters define the rates of protein [P j ] transport to membrane, j ∈ { Gag-Pol, Gag, gp160}, and the degradation rates of proteins Gag-Pol, Gag, gp160, Tat and Rev, respectively.Their reference values and admissible ranges are specified in [15].

Assembly, Budding and Maturation
The last stages of the HIV-1 replication cycle consist of the transport of the proteins (both regulatory and accessory), viral glycoproteins and the full-length mRNA molecules to the cell plasma membrane.The viral proteins (Gag-Pol, Gag, gp160) undergo a number of post-translational modifications.These include folding, oligomerization, glycosylation, and phosphorylation [23].The final step is the assembly of Gag and Gag-Pol proteins at the cell membrane followed by encapsidation of the viral RNA genomes.The budding of the synthesized virions ends with their maturation [24].The respective equations of the deterministic model read: where: ] are, respectively, the number of viral RNA molecules and the Gag-Pol, Gag, and gp160 viral protein molecules at the membrane.The model parameters represent the rates of RNA and protein [P j ] transport to the membrane, j ∈ {Gag-Pol, Gag, gp160}, the incorporation rate of molecules into pre-virion complexes, the number of viral RNA transcripts in a new virion, the number of Gag-Pol molecules in a new virion, the number of Gag molecules in a new virion, and the number of gp160 molecules in a new virion, respectively.Their reference values and admissible ranges are specified in [15].
The model parameter K V rel entering the virion complex assembly function, K V rel = 10 3 refers to the characteristic scale of the viral progeny per replication cycle [13].
The model parameters the membrane-anchored protein Gag-Pol, the membrane-anchored protein Gag, and the membrane-associated gp160, respectively.Their reference values and admissible ranges are specified in [15].
In the current model of HIV replication, the Michaelis-Menten-type parametrization is used to limit the rate of assembling of progeny virions by the least abundant protein component out of the three considered [P mem,Gag-Pol ], [P mem,Gag ], [P mem,gp160 ].The parameters K V rel , N Gag-Pol , N Gag , N gp160 specify the reference amount of the viral progeny per cell and the number of protein molecules Gag-Pol, Gag, and gp160 required for each virion, respectively.
Note that the above function f c (x 18 , . . ., x 21 ) is the only difference from the similar equations presented in [15], where the corresponding function reads as f c = x 18 x 19 x 20 x 21 .This modification was proposed in a model of influenza A virus replication [25,26].
At the late phase of virus replication, the viral RNA and viral proteins at the membrane associate with the pre-virion complex and then combine to generate a new virion, as shown in Figure 1 where: is the number of virions on the membrane; is the number of free viruses after budding from the cell; x 24 = [V mat ] is the number of mature virions outside the cell.The model parameters represent the incorporation rate of molecules into pre-virion complexes, the budding rate of new virions, the maturation rate, the degradation rate of the assembled pre-virion complex, the degradation rate for budded immature viral-like particles, and the clearance rate of free mature virions, respectively.Their reference values and admissible ranges are specified in [15].

Stochastic Markov Chain Modelling
The use of ordinary differential equations for the biochemical species concentrations assumes that they vary continuously and in a deterministic manner.However, some of the HIV-1 replication steps are characterized by low numbers of the reactants and the greater impact of the random fluctuations in the reaction rates, thus invalidating to a certain degree the deterministic approach.The stochastic re-formulation of the model allows one to overcome the limitations of the deterministic framework.A general approach to stochastic formulation is based on considering the reactions as Markov processes, i.e., random walks in the state space of the system, implemented numerically using Monte Carlo techniques.Markov chain model considers the exact number of species with a discrete change in their numbers according to the probabilities of the individual reaction to happen per unit of time.The later are defined by the reaction rates and the abundance of the chemical species of the underlying deterministic model listed in Table 1.We follow the Gillespie approach [17] as detailed below.The full details of the Markov chain are presented in Table 1.Here, superscript ++ in x ++ n means that this component is increased by one, and −− in x −− n means that this component is decreased by one.

Algorithm
The realizations of the stochastic Markov chain model are inherently variable in their dynamics.To robustly estimate the mean trajectories and uncertainty in the dynamics, an averaging is required over a large ensemble of realizations (∼10 5 ).This might appear to be computationally demanding and depends on the simulation time for a single run of the model.Therefore, a method for computer modelling of the Markov chain (1) should be fast and accurate.To model a Markov chain numerically, several methods have been proposed; the most often used is Gillespie's direct method [17,27,28].
During the process, some components can reach large numbers.This essentially slows down the computation process, as a high population number causes extremely short time intervals between events in the Markov chain.At the same time, the evolution of highly populated components can be modelled accurately enough by a deterministic process described by an appropriate ODE.To overcome this issue, a hybrid approach has been proposed in [29,30] in which the simulation process is switched from stochastic to deterministic for all components simultaneously as soon as the component with the smallest population size exceeds a certain threshold (see also [28]).In [31,32], a method was applied in which the deterministic dynamics for highly populated components and the stochastic dynamics for smaller populated components are coupled and computed in parallel.Here we further develop this approach and propose an algorithm with coupled stochastic and deterministic processes with capability to automatically switch the dynamics of any component, x n , from stochastic to deterministic and back as soon as this component exceeds a predefined threshold X or, respectively, becomes below it.Therefore, at any time interval between the transitions, all components are divided into two time-varying sets: S X = {n : x n ≤ X} and S X = {n : x n > X}.Components, x n ∈ S X , currently having a relatively small number of particles are modelled stochastically by the Markov chain described in Table 1.Other components, x n ∈ S X , with a large population size are modelled by the corresponding deterministic Equations ( 1)- (24).With the change of population, a component, x n , can be moved automatically from one set to another.
We also divide the reactions into the sets of stochastic ones, S T , for which the transition in Table 1 contains at least one stochastically modelling component, and deterministic ones, S T , with transitions without such a component at the current time interval.The stochastic reactions m ∈ S T are modelled by Gillespie's next reaction method: at every step two uniformly distributed random numbers r 1 , r 2 on (0, 1) are generated.The first number gives the next time step ∆t = −(ln r 1 )/A where A = ∑ m∈S T a m .The second random number determines the next reaction index p ∈ S T : the smallest integer satisfying A p > Ar 2 where A p = ∑ p m=1, m∈S T a m , i.e., the summations are made on stochastic reactions only.As we have to search between up to 51 reactions, the binary search is used to accelerated finding the reaction p.Additionally, to accelerate the computation, the propensity is updated only for reactions in which a m depends on changed components in the given step.For this purpose, a special array is prepared in which propensities to be updated are indicated for a given component x n and another similar array for every reaction m.As for deterministically varying components, x n , n ∈ S X , the corresponding equations from the ODE system (1)-( 24) are integrated employing the predictor-corrector method, as described in [32].To provide a proper accuracy, the time step is restricted by the value ∆t max .
The algorithm is implemented in C++.Here arrays of pointers to functions are actively used to directly call functions that should be calculated for a given reaction without spending time on other reactions; this also accelerates the computation.We compute functions of propensities a m for stochastic reactions or the right-hand sides of ODEs ( 1)-( 24) for deterministic reactions.
The hybrid code is flexible: for a negative threshold X < 0, only deterministic equations are integrated.Here the solution coincides with that shown in Figure 2 obtained by direct integration of ODEs ( 1)-( 24) with the use of high order adaptive step size methods.If the threshold is set extremely high, say, 10 10 , such that the probability to reach this value for any component is infinitesimal, then all components are treated as stochastic, modelled by Markov chain Table 1 and computed by Gillespie's method.This allows the estimation of the accuracy of the hybrid algorithm by comparing histograms for the hybrid and the fully stochastic models.We set the threshold X = 2000 for our computations.Our study showed that the computed statistical characteristics coincide very closely with those obtained by Gillespie's method.
The computations were run on an Intel Xeon E3-1220 v5 CPU 3GHz×4.One realization of the full model with [V free ](0) = 4 required about 14 s of CPU time, while the hybrid model was computed in 60 milliseconds on average.Thus, the use of the modified hybrid scheme proposed here accelerates the computation by a factor of 200.We computed 10 5 realizations for every value of an initial number of free virions [V free ](0) to obtain the statistic characteristics described in the next section.

Stochastic Modelling Results
The stochastic modelling enables computation of the probability of a cell to be infected in a certain time depending on the initial number of free virions attaching to the cell.Experimental data presented in [10] indicate that the number of provirus copies in splenic cells of SIVmac251-infected rhesus macaques ranged from one to three.A number of virions attaching to the target cell equal to four ([V free ](0) = 4) enables the integration of a similar number of proviral DNA, thus approximating the scenario of in vivo infection.The corresponding plots are shown in Figure 3A.There is a smooth monotonically growing dependence on [V free ](0) tending to one with the growth of this value.Furthermore, the probability of successful integration increases with time.The random effects in reaction kinetics result in a heterogeneous distribution of cells with respect to the amount of integrated proviral DNA (Figure 3B).The simulation results indicate that for the parameter settings specified in Table 1, infection with four virus particles leads to a distribution of integrated DNAs ranging from zero to four with the median value of two.The distribution of [DNA int ] defines the phenotypic diversity of secreted virions as shown in Figure 3C.Note that the above computation did not consider intracellular restriction mechanisms that would modify these values.
The evolution of the histograms of the stochastic model variables for initial condition [V free ](0) = 4, i.e., MOI = 4, is presented in Figure 4.They show that the abundance of the biochemical components underlying the virus replication is not homogeneous but displays multi-hump patterns indicated by strips with darker colours.The number of humps depends on the species and ranges from two to five.This is clearly seen in Figure 5, where the histograms of all components are presented at time t = 36 h for initial condition [V free ](0) = 4.
The variance in the intensities of individual reaction stages of the life cycle is shown in Figure 6.They are consistent with the transcription and translation data presented in [33].

Sensitivity Analysis
In a previous work [15], the local sensitivity of the deterministic model was computed using the adjoint method to identify the potential targets for antiviral therapy.As we have modified the model by considering the features of the assembly in more detail, we conducted the sensitivity analysis for the extended model.We considered the global variance-based method of sensitivity analysis, which is used to determine the contributions of individual parameter variations to the changes in the variance of the model output while allowing simultaneous variations across the whole input space of all model parameters.Let y = f (p) be the model output where p = [p 1 , . . ., p L ] is a vector of inputs, i.e., model parameters, which are treated as independently distributed random variables.The variancebased methods of global sensitivity analysis rely on the following decomposition of the output variance [34,35]: which includes the variance V i caused by varying p i alone, the variance of second-order interactions V ij caused by varying p i and p j simultaneously (additionally to individual V i and V j ) and of the other higher-order interactions, up to the variance V 1 2...L caused by varying all components.The first-order S i and total-order S total i sensitivity indices are defined as where p ∼i = {p l } l =i l∈ 1,L is a set of all parameters except p i .The sensitivity index S i measures the effect of varying parameter p i alone, averaged over variations in other parameters p ∼i , while S total i approximates the effects of variations of p i including all of the variance caused by its interactions with other parameters.
There are two well-established methods to compute the sensitivity indices specified in Equations ( 26) and ( 27).Sobol's method employs low-discrepancy sequences and Monte-Carlo integration methods to explore the input space of parameter variations.The extended Fourier amplitude sensitivity testing (eFAST) method makes use of Fourier decomposition to search in the frequency space along mono-dimensional search curves for each parameter, which makes it possible to obtain both first-and total-order indices with a lower number of model evaluations than with Sobol's method [36].

Sensitivity Analysis of the Deterministic Model towards the Cumulative Virion Release
To study the sensitivity of the processes towards the cumulative number of released virions throughout 36 h (the area under the curve-type of the metric for released virions dt , which characterises the availability of the virus for infection of other cells, or the total virus exposure), we applied the eFAST method to the deterministic model with [V free ](0) = 4 virions.We assumed the uniform distribution of model parameters in the ranges specified in [15].For the newly introduced parameter K V rel = 10 3 we assumed the range from 10 2 to 10 4 virions.As proposed in [37], we included in the input variables a "dummy" input, which is uniformly distributed in the unit segment and is not presented in the model.The total sensitivity index of dummy input S total dummy should have a low value that represents the variance from interactions between other parameters but not with the parameter of interest, which is included in the definition of the total order index (27) [37].The dummy input can be regarded as a negative control to determine significant sensitivity indices relative to the level of variance S total dummy .The first-and total-order sensitivity indices to the cumulative number of released virions throughout 36 h computed with the eFAST method are presented in Figure 7.We used N r = 20 resamples of the method to obtain the means and standard errors of the means of sensitivity indices, and to test for their significance with Welch's t-test.The sample size for each search curve was set to N s = 10 4 .The sensitivity indices of the parameters that have the significant effects are summarized in Table 2.An important aspect of HIV-1 infection is the establishment of a pool of latently infected cells.After the proviral genome integration into the chromosome of the host cell [DNA int ] the provirus may remain silent, i.e., transcription of viral genes can be suppressed by the infected cell.Although our model does not describe the mechanisms by which this suppression is regulated, we can investigate the impact of variations of the early stage processes on the distribution of the number of integrated proviral DNA molecules [DNA int ](36) using the stochastic model.
To perform a sensitivity analysis on the stochastic model, one needs to define the model output that can be compared between the ensembles of stochastic realizations with perturbed parameters.Given the ensemble of model runs, histograms and empirical cdfs approximate the pdf and cdf of the variables at certain times, and various distance functions defined on them can be used.In [38], the sensitivity analysis of stochastic Markov Chain models was performed using the histogram distance between the unperturbed model output and the output with perturbed parameters.The histogram distance between sets x and y is defined as where n x and n y are the number of elements in x and y, respectively, K is the number of histogram bins dividing the range from min{x min , y min } to max{x max , y max } in I k intervals, and χ(x j , I k ) is a characteristic function that is equal to 1 if x j lies in I k and 0 otherwise.The local sensitivity indices can be defined as where x 0 is the set of model outputs with baseline parameters, x p i , with the increased parameter p i on a small value ∆p i .The indices Ŝi were scaled on baseline parameter values to make their ranking possible.
Here, we only implement the local sensitivity approach due to the computational costs needed to obtain the ensemble of stochastic realizations with a fixed set of parameters.However, the low-cost screening methods of global sensitivity analysis, such as the Morris method [39,40], can also be effectively implemented for stochastic models [38].
The model output is the number of integrated proviral genomes x 6 = [DNA int ] at t = 36 h.We obtained an ensemble of 10 5 model runs for each parameter set.For this number of runs, the measured baseline self-distance D(x 0 , x 0 ) equals zero.The baseline parameter values are perturbed by 1%.In all stochastic realizations, the possible values for model output {0, 1, 2, 3, 4} are the same.Hence, K = 5.The sensitivity indices Ŝi are reported in Table 3.

Discussion
Our study provides a high-resolution model of the stochastic dynamics of HIV replication in productively infected cells.The model is built as a Markov chain and implemented by a Gillespie-type algorithm [17].In contrast to the deterministic prototype model [15], the stochastic version considers a Mechaelis-Menten type description of the engagement of Gag and Gag-Pol polyproteins and gp160 glycoproteins into an assembly of nascent virions forming around dimeric viral RNA.The numerical implementation of the model was characterised by using (i) a hybrid method to adaptively manage the low and high abundance of molecular components, (ii) binary search, and (iii) a list of pointers of active reaction components at each step.These make single runs fast and computationally less demanding.A global sensitivity analysis method, i.e., the extended Fourier amplitude sensitivity testing (eFAST) [36,37], was implemented.Note that the representation of the Mechaelis-Menten kinetics in the stochastic algorithm requires further systematic analysis and justification.
Extensive simulations with the model led to a number of biologically relevant insights concerning the heterogeneity of cell infection and viral replication.First, the infection with a fixed number of viruses per CD4 T cell (V 0 = [V free ](0)) leads to a heterogeneity of infected cells with respect to the number of integrated proviral genomes ranging from zero to V 0 (see Figure 3).This finding suggest that in the analysis of the mechanisms behind multiply infected cell populations, not only phenotypic differences in cell susceptibility need to be considered [11] but also the inherent randomness and the discrete nature of the biochemical stages of HIV infection.The multiplicity of infection translates into histograms of the viral progeny showing multimodal Gaussian-type distributions with clear peaks at some regularly spaced values with increasing variance from left to the right.Second, the computed dynamics of the distributions of the 24 HIV life cycle components allows one to identify the bottleneck factors in virus production (see Figure 5).For the reference set of the model parameters, these are Gag-Pol proteins.The stochastic model provides a direct insight into the intensities (fluxes) of the reaction steps through the life cycle of the virus (see Figure 6), which are comparable with experimentally measured transcription and translation events [33].Third, local and global sensitivity analysis of the stochastic model provided a ranking of model parameters that strongly affect the size of viral progeny, and thus can be considered as potential therapeutic targets.In contrast to the local sensitivity features of the deterministic model [15], the first three globally influential parameters are: • Transport of genomic mRNA to membranes; • Tolerance of transcription activation to Tat-mediated regulation; • Degradation of free and mature virions.
From a biological point of view, an extension of the model is required to consider antiviral defence mechanisms activated in the infected cells (e.g., type I interferon, SAMHD1, Tetherin) and the counteraction of the viral proteins (Vpu, Vif) inhibiting them.These mechanisms will impact the proviral copy numbers and virion burst size.In addition, the transport kinetics of the virus life cycle constituents requires a proper incorporation into the model [41].These will be subject of our future work.Overall, our results provide a new tool to simulate the HIV life cycle in infected cells and suggest that stochastic effects inherent in the HIV replication cycle must be considered among the relevant mechanisms contributing to the phenotypic diversity and variability of dynamics of HIV infection.A clear distinction between deterministic and stochastic components of HIV-infected host cell interactions will provide a better understanding of the origins of heterogeneity of the viral replication.The knowledge could be further utilized in the analyses of variability of other virus infections, such as HBV, HCV, SARS-CoV-2.

Figure 3 .
Figure 3. Probability of cell infection (i.e., integration of at least one proviral DNA molecule [DNA int ]) at a certain time as a function of the initial number of free virions [V free ](0) (A).Histograms of number of integrated proviral DNA molecules [DNA int ] (B), number of released matured virions [V mat ] (C) at t = 36 h for initial condition [V free ](0) = 4.

Figure 4 .
Figure 4. Temporal evolution of the histograms of the stochastic model variables for the initial condition [V free ](0) = 4.

Figure 7 .
Figure 7. Sensitivity indices for the cumulative number of released virions up to 36 h computed with the deterministic model.The bars and the error bars correspond to the means ± SEMs obtained on N r = 20 resamples.The significance levels of Welch's t-test for comparing total order indices against S total dummy are p < 10 −20 ( * * * ), p < 10 −10 ( * * ), p < 10 −5 ( * ).

Table 1 .
Markov chain: the list of individual reactions m, the corresponding state transitions and the propensities of reactions which define the probabilities of the individual reactions to occur over a small time interval δt.

Table 2 .
Sensitivity indices of the model parameters that have the biggest influence on cumulative virion release.The parameters are sorted in descending order by total order sensitivity indices.The means ± SEMs obtained on N r = 20 resamples are indicated.

Table 3 .
Local sensitivity indices Ŝi of the stochastic model parameters having the most influence on the number of integrated proviruses [DNA int ](36).