Mathematical Modeling Suggests Cooperation of Plant-Infecting Viruses

Viruses are major pathogens of agricultural crops. Viral infections often start after the virus enters the outer layer of a tissue, and many successful viruses, after local replication in the infected tissue, are able to spread systemically. Quantitative details of virus dynamics in plants, however, are poorly understood, in part, because of the lack of experimental methods which allow the accurate measurement of the degree of infection in individual plant tissues. Recently, a group of researchers followed the kinetics of infection of individual cells in leaves of Nicotiana tabacum plants using Tobacco etch virus (TEV) expressing either Venus or blue fluorescent protein (BFP). Assuming that viral spread occurs from lower to upper leaves, the authors fitted a simple mathematical model to the frequency of cellular infection by the two viral variants found using flow cytometry. While the original model could accurately describe the kinetics of viral spread locally and systemically, we found that many alternative versions of the model, for example, if viral spread starts at upper leaves and progresses to lower leaves or when virus dissemination is stopped due to an immune response, fit the data with reasonable quality, and yet with different parameter estimates. These results strongly suggest that experimental measurements of the virus infection in individual leaves may not be sufficient to identify the pathways of viral dissemination between different leaves and reasons for viral control. We propose experiments that may allow discrimination between the alternatives. By analyzing the kinetics of coinfection of individual cells by Venus and BFP strains of TEV we found a strong deviation from the random infection model, suggesting cooperation between the two strains when infecting plant cells. Importantly, we showed that many mathematical models on the kinetics of coinfection of cells with two strains could not adequately describe the data, and the best fit model needed to assume (i) different susceptibility of uninfected cells to infection by two viruses locally in the leaf vs. systemically from other leaves, and (ii) decrease in the infection rate depending on the fraction of uninfected cells which could be due to a systemic immune response. Our results thus demonstrate the difficulty in reaching definite conclusions from extensive and yet limited experimental data and provide evidence of potential cooperation between different viral variants infecting individual cells in plants.


Introduction
With a burgeoning human population expected to reach between 7 and 13 billion by 2100, humans' lifeblood, food and water, will be evermore difficult to protect and sustain over time [1][2][3]. Under these circumstances, dependence on agriculture will only increase [4]. Food crops, however, are vulnerable to numerous biotic stresses, including but not limited to animal pests, fungi, bacteria, and viruses. Viral infections especially can devastate food-crops, with documentation of such infections being identified as early as eighth-century Japan; however it was not until the nineteenth century that it was known and accepted that microscopic agents like viruses could cause diseases in plants [5,6].
Mathematical models have been widely used to understand virus-plant interactions. For example, early studies investigated how the virus concentration in the inoculum influences the number of lesions formed by the virus on plant leaves [7][8][9][10]. More recent studies investigated virus dynamics in individual plant cells or in the whole plant [11][12][13][14]. Most studies, however, focused on understanding epidemiological spread of viral disease in plant populations with the aim to control viral spread and to limit damage to agricultural crops [15][16][17][18][19][20][21][22][23][24][25].
Mechanisms of viral spread within individual plants remain incompletely understood. Usually infection of a single cell or a small group of cells occurs via mechanical means or by an animal or insect vector. After replication in the inoculation site, virions move to neighboring cells through plasmodesmata, pores between individual cells in the leaf [26]. The replication-movement process is repeated until the virus enters the vasculature. It has been experimentally demonstrated that viral distribution via the vasculature follows the path of sugar distribution, i.e., from source to sink tissues, with strong sinks like roots receiving a larger portion of the viral cargo [27]. Once arriving at sink tissue, the virus exits the vasculature via the plasmodesmata and enters neighboring cells. From there viruses use plasmodesmata once again to invade the ground tissue [27,28]. In some cases, however, viruses can be introduced directly into the vasculature resulting in rapid infection of sink tissues.
Different methods have been used to measure the degree of infection of a given leaf in the plant including ELISA for viral proteins and PCR for viral genomes [12,29]. However, these methods are semi-quantitative and typically do not allow measurement of the degree of infection of individual cells in the leaf. Recently, a new method to measure the frequency of infection of cells in plant leaves through the use of flow cytometry was developed [14]. In their experiments, Tromas et al. [14] infected lower (3rd) leaves of 4 week old Nicotiana tabacum (henceforth referred to as "tobacco") plants with two strains of Tobacco etch virus (TEV), TEV-Venus and TEV-BFP, carrying different fluorescent proteins. At different times after the infection cells (protoplasts) were isolated from individual leaves, and the fraction of protoplasts infected with either or both viral variants was quantified using flow cytometry [14]. Flow cytometry allowed the measurement of virus infection in thousands of individual cells, thus providing unique quantitative information about kinetics of TEV infection in tobacco plants.
Tromas et al. [14] performed several important analyses including calculation of basic reproductive number and multiplicity of infection of cells (MOI) by different viruses. In addition, the authors developed a detailed mathematical model of how the virus spreads over time from the 3rd leaf to other leaves and fitted the model to experimental data. Importantly, the model was able to accurately describe virus dissemination and predicted that viral spread kinetics was similar within the leaves. One major difference between infection levels in individual leaves was due to different import rates of the virus from the lower to upper leaves [14].
Here we built upon this pioneering work and further analyzed the experimental data of Tromas et al. [14] with use of mathematical models. The main objectives of our study were to understand the details of dissemination of TEV in tobacco plants and to determine if coinfection of individual plant cells with two TEV variants occurs independently. Specifically, because exact pathways of TEV dissemination in tobacco plants have not been unequivocally identified and may depend on the age of plants and details of virus inoculation, we investigated whether mathematical models of TEV dissemination, alternative to the Tromas et al. [14] model, may be also consistent with the data. Surprisingly, we found that indeed many different routes of TEV dissemination (e.g., when the initial infection first spreads in top (7th) leaf and then disseminates to lower leaves) are quite consistent with experimental data, even though some such models fitted the data with slightly reduced quality (as evaluated by AIC, [30]). By analyzing kinetics of coinfection of individual cells by two TEV variants we found that coinfection does not proceed randomly; rather, cells are more likely to be coinfected with two viruses than infected with either of the variants suggesting cooperativity in infection (or that plant cells vary in susceptibility to infection). Our results suggest that understanding pathways of virus dissemination in plants will be difficult using only data on virus infection in individual leaves and may likely require specific experiments that determine the systemic distribution of virions in host tissues over the time course of infection.

Data
Specific details of how infection of plants had been performed are given in the previous publication [14]. In short, 4-week-old Nicotiana tabacum L. cv. Xanthi plants, a widely used model plant host [31], were inoculated into the 3rd leaf with an equal mixture of TEV-BFP and TEV-Venus. These two viral strains express blue and yellow fluorescent proteins, respectively. Preliminary work demonstrated that expression of these proteins does not impair growth kinetics of the viral variants [14]. To measure the kinetics of viral dissemination 3rd, 5th, 6th, and 7th true leaves of individual plants were removed at days 3, 5, 7 and 10 post inoculation; five plants per time point were analyzed. Leaf 4 was skipped because it did not show any infection under the experimental conditions. From these leaves, plant cells with their cell walls removed (protoplasts) were isolated and the number of protoplasts expressing none, one, or both of the two fluorescent proteins was measured with flow cytometry. The data have been formatted and are available as a supplement to this paper.

Mathematical Models 2.2.1. Original Virus Dissemination Model of Tromas et al.
To predict kinetics of infection of the inoculated leaf and dissemination of infection to other leaves in the plant Tromas et al. [14] developed a novel mathematical model. The model tracks the fraction of infected cells in a kth leaf, I k , over time with S k being the fraction of susceptible cells. In the model a cell infected with either of two viral variants or both viral variants is considered to be infected. The model assumes that infection starts at leaf 3 and then proceeds in the leaf k = 3 at a rate β and disseminates to upper leaves (leaves 5, 6, 7) at a rate proportional to the total infection rate of the leaves below a given leaf k at a rate χ k ( Figure 1A). When the virus reaches other leaves, infection also proceeds locally at a rate β. Local virus dissemination at a kth leaf stops when the fraction of infected cells reaches a critical level ψ k : where β is the rate of infection of uninfected cells in the leaf during to local viral spread in the leaf, χ k is the rate of virus infection of other (upper) leaves, and ψ k is the level at which infection of new cells in leaf k stops (see Equations (S1)-(S4) for full set of equations of the model). Initial conditions for the model are I k (0) = I 0 if k = 3 and I k (0) = 0, otherwise, and S k (0) = 1. In total, this model (as well as alternative models 1-7) has 9 parameters to be estimated from the data.

Alternative Virus Dissemination Models for the Total Leaf Infection
While the original mathematical model of Tromas et al. [14] seems logical we sought to investigate whether alternative mathematical models of virus dynamics within individual leaves and virus dissemination to other leaves in the plant may be consistent with exper-imental data. In most of these alternative models we use the same nomenclature for the model parameters (I 0 , β, χ k , and ψ k ) as in Tromas et al. [14].  [14], two different viruses ("Venus" and "BFP") were rubbed into leaf 3 and the fraction of infected leaf cells (protoplasts) was followed by flow cytometry over time (see Section 2 for more detail). In schematics, S k and I k denote the fraction of uninfected and infected cells in the kth leaf, respectively, and the syringe indicates the primary place where infection started in the model. Arrows denote the process of leaf infection (at a rate β) and transmission of infection between leaves (at a rate χ). In the original Tromas et al. [14] model ((A), Equations (1)-(3)), infection starts at leaf 3 and is then transported to other leaves at a rate proportional to the total fraction of infected cells in leaves below. In alternative model 1, infection starts with leaf 3 but upper leaves are only infected by the leaves just below them ((B), Equation (4)). In alternative model 4, infection starts in leaf 5 and then proceeds to leaves above or below leaf 5 similar to the alternative model 1 ((C), Equation (11)). Finally, in the alternative model 7 infection starts at the upper leaf 7 and proceeds to lower leaves in a manner similar to the original Tromas et al. [14] model ((D), Equation (15)). Other alternative models are described in the Section 2.
• Alternative model 1. In this model the dynamics of infection of the leaf 3 is given in Equation (1), and instead of summing the infection from all the leaves below, we suppose that only the leaf immediately below the one in question can infect it. The dynamics of uninfected leaves is given by Equation (3). Dynamics of infection in other leaves is described by the following equations (see Figure 1B): Initial conditions for the model are I k (0) = I 0 if k = 3 and I k (0) = 0, otherwise, and S k (0) = 1. Note that in this model infection of leaf 5 occurs from leaf 3 and not leaf 4. • Alternative model 2. Leaf 3 infects only leaf 5 which then infects leaves 6 and 7. Leaf 6 also contributes to the infection of leaf 7. Infection for leaf 3 is given by Equation (1) and dynamics of uninfected leaves is given by Equation (3). Dynamics of infection in other leaves is described by the following equations: dI k dt = βI k S k + χ k S k I 5 , k = 6, 7.
Initial conditions for the model are I k (0) = I 0 if k = 3 and I k (0) = 0, otherwise, and S k (0) = 1. Note that infection of leaf 5 occurs by leaf 3 and not leaf 4. • Alternative model 3. Infection for leaf 3 is given by Equation (1) and dynamics of uninfected leaves is given by Equation (3). Leaf 3 is the only leave that contributes to infections of higher leaves. Dynamics of infection in other leaves is described by the following equations: Initial conditions for the model are I k (0) = I 0 if k = 3 and I k (0) = 0, otherwise, and S k (0) = 1. • Alternative model 4. The initial infection occurs on leaf 5 which contributes to infections of leaves 3, 6, and 7. All imported virions for these leaves come exclusively from leaf 5. Dynamics of uninfected leaves is given by Equation (3). Dynamics of infection in other leaves is described by the following equations (see Figure 1C): Initial conditions for the model are I k (0) = I 0 if k = 5 and I k (0) = 0, otherwise, and S k (0) = 1. • Alternative model 5. The initial infection occurs on leaf 6 which contributes exclusively to the infections of leaves 3, 5, and 7. Dynamics of uninfected leaves is given by Equation (3) and dynamics of infection in other leaves is described by the following equations: dI k dt = βI k S k + χ k S k I 6 , k = 3, 5, 7.
Initial conditions for the model are I k (0) = I 0 if k = 6 and I k (0) = 0, otherwise, and S k (0) = 1. • Alternative model 6. The initial infection occurs on leaf 7 which contributes exclusively to the infections of leaves 3, 5, and 6. Dynamics of uninfected leaves is given by Equation (3) and dynamics of infection in other leaves is described by the following equations: Initial conditions for the model are I k (0) = I 0 if k = 7 and I k (0) = 0, otherwise, and S k (0) = 1. • Alternative model 7. The initial infection occurs on leaf 7 and virus accrues downward; it is essentially the model by Tromas et al. [14] being inverted. Dynamics of uninfected leaves is given by Equation (3) and dynamics of infection in other leaves is described by the following equations (see Figure 1D): Initial conditions for the model are I k (0) = I 0 if k = 7 and I k (0) = 0, otherwise, and S k (0) = 1. • Alternative model 8. The model assumes that infection starts in all leaves and proceeds independently (a.k.a. "logistic" model for individual leaves). Dynamics of infection in all leaves is described by the following equations: Initial conditions are I k (0) = I 0 k , k = 3, 5, 6, 7. This model has 12 parameters to be estimated from the data. • Alternative model 9. In all previous models virus dissemination within a given leaf stops when the fraction of infected cells reaches ψ k (e.g., Equation (3)). This stop of infection is also observed in the data. However, specific mechanisms of why the infection stops while not all cells in the leaf are infected were not fully investigated. Therefore, in our alternative model we assume that the dynamics of virus infection in a given leaf are not infection level-dependent but instead time-dependent. We define T k to be the time that the kth leaf accumulates the "immune response" to stop the spread of the virus inside it, and n k represents how quickly this immune response kicks in [32]. The dynamics of the infection is given by the same equations as in the Tromas et al. [14] model (Equations (1) and (2)), and the dynamics of uninfected cells available for infection due to generation of the immune response in the kth leaf is given by where the initial conditions for the model are I k (0) = I 0 if k = 3 and I k (0) = 0, otherwise. This model has 4 extra parameters as compared to other alternative models but the model can be reduced in size by assuming that some of the parameters (e.g., T k or n k ) to be leaf number-independent (see Main text for results). In such cases, the model has 10 parameters to be estimated from the data.

Virus Dissemination Models for the Infection/Coinfection with Two Viral Variants
In the experiments, the plants were infected with an equal mixture of two viral variants, TEV-Venus and TEV-BFP [14]. However, the original model of Tromas et al. [14] and our previous alternative models did not discriminate between infection of the cells with two variants. The following alternative models now make this distinction. In these models we denote V k and B k as the fraction of Venus-infected and BFP-infected cells, respectively, and the fraction of coinfected cells is denoted as M k . Because our analysis illustrated that the specific pathway of TEV dissemination in 4-week-old tobacco plants cannot be fully resolved using infection data alone, we assume the dissemination pathway of Tromas et al. [14]. Then the dynamics of infection of plant leaves with the two viral variants we use the following equations: where β B and β V are the within-leaf infection rates for BFP and Venus viruses, respectively, and I k (t) = V k + B k + M k . Note that we assume that virus dissemination to upper leaves is strain-independent. The initial conditions for all the following models are V k (0) = V 0 , B k = B 0 , and M k (0) = M 0 if k = 3 and 0 otherwise. To describe the kinetics of viral coinfection we consider several alternative mathematical models.
• 1-alpha coinfection model. In this model, we describe the coinfection growing as dependent on the within-leaf spread dynamics of both viruses. Here, and in other models V k B k is proportional to the rate at which coinfections are expected to arise by chance. We sum these these rates assuming that cells are first infected by one variant and then coinfected with another, and use a scaling factor α to indicate synergy (α > 1) or inhibition (α < 1) of the coinfection process as compared to random, mass action-like infection process: This model has 12 parameters ( Figure 2A). • 2-alpha coinfection model. We assume that the rate of coinfection may proceed differently by the two viral strains denoted by α 1 and α 2 which is a simple extension of the 1-alpha coinfection model (Equation (21)): This model has 13 parameters. • Probabilistic model. Because V k and B k measure the fraction of cells infected by the particular virus in the kth leaf, then for fraction of coinfected cells, M k , we can think of the probability of a cell being infected by both strains as being determined by B k V k . We can then use parameter α to measure how much more or less often coinfection is happening as compared with random chance: α = 1 means coinfection is behaving like a random process; α < 1 means coinfection is occurring less often than it would by random chance, and α > 1 means coinfection is occurring with greater frequency than random chance [33]. Multiplying by α the product B k V k and differentiating it with respect to t gives: which then with the use of Equations (18) and (19) results in the following model for the dynamics of coinfected cells: This model has 12 parameters. Note that in contrast with previous models (e.g., Equation (22)), in this model coinfection within the leaf depends on the fractions of uninfected (S k ) and virus-infected cells in the leaf (V k and B k ). • 2-alpha probabilistic model. As in the original Tromas et al. [14] model, the equation for coinfection in the probabilistic model is composed of two parts (Equation (24)): the first term with parameters β B and β V represents the within-leaf spread, and the second term with the parameter χ k represents the leaf-to-leaf spread. It seemed reasonable that coinfection may be driven more by one form of spread or the other, so we used α 1 and α 2 to measure their respective contributions: This model has 13 parameters ( Figure 2B).
• Logistic model for coinfection growth. The details of how plant cells become coinfected by two different viruses during the local spread are not fully understood. Because typically plant viruses spread to adjacent cells via plasmodesmata, a coinfected cell may be a source of both viral strains when infecting neighboring cells. In this alternative model we therefore assume that the frequency of coinfected cells increases randomly due to viral dissemination systemically from other leaves and logistically due to local, within-leaf spread: This model has 12 parameters.  (25)). Therefore, we use α 1 and α 2 to differentiate between coinfection occurring as within-leaf and leaf-to-leaf/systemic spread, respectively: This model has 13 parameters.  (25)). Major model parameters such as β and χ k have the same meaning as in the previous models (e.g., Figure 1).
We only show what happens in leaf 5 because in the 2-alpha probabilistic model ((B), Equation (22)), the connections between higher leaves become very complicated and difficult to illustrate in a figure such as this one. Like Figure 1, arrows represent the transmission of virions. In the 1-alpha coinfection Model ((A), Equation (21)), coinfection comes from the combination of the Venus and BFP viruses within leaf 5 only. In the 2-alpha probabilistic model ((B), Equation (22)), coinfection comes also from the combination of Venus and BFP virions in leaf five, but is also fed by the combination of virions imported from leaf 3 and combining with their opposite, e.g., Venus from leaf 3 combining with BFP from leaf 5. Two separate alpha terms are used to distinguish dynamics between within-leaf growth and infection from virions imported from lower leaves.

Statistical Treatment
To fit models to data we used two alternative approaches. Tromas et al. [14] proposed to use the following binomial distribution-based likelihood to fit the models to data where L is the likelihood of the model given the data, I k,t,p is the model prediction for the frequency of infection (by either or both viral variants) of the particular leaf k and time point t of a plant p, V k,p,t is the number of infected cells observed in a sample, A k,p,t is the total number of cells observed in the sample (k is the leaf number, p = 1 . . . 5 is the plant replicate number, and t is the day on which the observation was made). The model parameters are estimated by minimizing the negative log likelihood nll In the "coinfection" models we track the dynamics of cells infected with individual viral strains as well as coinfected cells. In these models I V k,p,t , I B k,p,t , I M k,p,t represent the model predictions for the frequency of Venus-or BFP-infected, or coinfected cells, respectively. Therefore, to fit the coinfection models to data we extended the binomial distributionbased likelihood in the following way. We let V V k,t,p , V B k,t,p , V M k,t,p be the number of cells infected by Venus, BFP, or both, respectively, as was measured experimentally. Note that and nll is simply where the best fit parameters are found by minimizing the nll. Binomial distribution-based likelihood takes into account the number of cells (protoplasts) extracted from each leaf. The total number of extracted cells varied dramatically between leaves (by up to 8 fold). It was therefore possible that different numbers of cells in the data may skew the likelihood-based estimates towards measurements with more cells. We therefore aimed to investigate whether other methods, e.g., assuming normally distributed data, i.e., normal distribution-based likelihood or least squares, can be used to fit the models to data. We tried several different ways of how least squares could be used to fit the models to data.
One approach is to use the frequency of infected cells I k,t,p as predicted by the mathematical model with the data V k,t,p /A k,t,p . For the models that only consider uninfected and infected cells (i.e., cells infected with either viral variant or coinfected with both variants), the sum of squared residuals (SSR) was then calculated as follows: In our analyses we found that such a method does not typically result in normally distributed residuals (see Results section for details). Given large variability in the fre-quency of infected cells over time we applied log-transformation to the data and the model predictions and calculated the SSR using the following formula: where the notations are the same as in Equation (34). Log-transformation of the data, however, is problematic because in 2 cases of leaf 5 infection, the measured frequency of infected cells was 0. One approach was to remove such data points from the analysis but data removal can generate biases in the model fits, and therefore, we opted for a more appropriate approach whereby we replaced zeros in the data and the model predictions with the limit of detection (LOD). LOD in the data for infected cells was defined as the lowest value of the frequency of infected cells found in the data (for infected cells LOD = 5.12 × 10 −4 ). Similarly to Equation (33) we used the following definition for SSR to fit the coinfection models to the data on the frequency of cellular infection with Venus and the following is the log-transformed variant where data in which the frequency of infected cells was zero, we replaced these zero values with the LOD for frequency of cells infected with different viral variants as LOD Venus = 8.43 × 10 −5 , LOD BFP = 3.26 × 10 −4 , and LOD Mixed = 3.2 × 10 −5 . For binomial distribution-based likelihood, confidence intervals for best fit parameters were estimated by bootstrapping the data with replacement (sampling a given plant) 1000 times [34]. For least squares, we used routine minimize from the python library lmfit that provided 95% confidence intervals for the estimated parameters.
To compare alternative mathematical models we used Akaike Information Criterion, AIC, that are calculated differently for binomial distribution-and normal distributionbased (least squares) likelihoods [30]: where N is the number of data points in the sample (in this case N = 80), and N par is the number of model parameters estimated by fitting the model to the data. Note that AIC differences ∆ of 0-4 are typically considered to be small while a difference of 10 indicates inferiority of the model in describing the data [30]. If plant cells are infected randomly by two different strains of the virus we expect that the frequency of coinfections with two viruses should be proportional to the product of the frequency of infections with single viral strains. To estimate the deviation from the random coinfection we used Odds Ratio of infection (OR) proposed previously to estimate deviation from random coinfection for HIV [33]: where A k,t,p − V k,t,p is the number of uninfected cells and V k,t,p = V V k,t,p + V B k,t,p + V M k,t,p is the total number of infected cells in the data for the kth leaf, time point t, and plant p.

Programming Details
All major analyses were done in Python (ver. 3.7.2) and some analyses were repeated in R (ver. 3.9.1). Python libraries used were matplotlib (ver. 3.3.2), Pandas (ver. 1.1.3), NumPy (ver. 1.19.0), lmfit (ver. 1.0.1), and SciPy (ver. 1.5.2). To solve the ODE-based models we used the odeint routine from scipy.integrate package. To fit models to data we used a differential evolution algorithm when the goodness of fit metric was nll, and when minimizing least squares residuals we used the Levenberg-Marquardt algorithm with a trust region. Both methods are part of Python's lmfit library. To ensure reproducibility of our results as a part of this publication we share the data and the code to fit the original virus dissemination model to data using either binomial distribution-based likelihood or least squares, and the code to illustrate the impact of various parameters on the virus dynamics according to the 2-alpha probabilistic model (Equation (25)).

The Experimental Dataset of the Kinetics of TEV Spread
In their original study, Tromas et al. [14] manually introduced two different strains of TEV to the third leaf of the 4 week old tobacco plants and counted the number of infected and uninfected cells in different leaves (k = 3, 5, 6, 7) of the infected plants over time (t = 3, 5, 7, 10 days). Given that plant cells are immotile and are surrounded by cellulosic cell walls, viruses can infect other cells in the leaf via two ways: (1) by passing through pores in the cells' membranes and cell walls (called plasmodesmata) creating portals between adjacent cells, or (2) by entering the vasculature and migrating with phloem to other (sink) leaves of the plant [35]. Over time, the viral infection disseminates unequally between the leaves ( Figure 3 and Supplemental Figure S1). In particular, only about 10% of all cells in the originally inoculated leaf 3 become infected by 10 days of infection ( Figure 3A), while on average 30% of cells become infected in leaves 6 and 7 ( Figure 3C,D). Interestingly, leaf 5 becomes minimally infected ( Figure 3B), and infection did not spread to leaf 4 [14]. There was great variability between infection of leaves in individual plants; for example, in leaf 7 by day 10 less than 10% of cells were infected in one plant while over 40% were infected in another plant ( Figure 3D).

Model with Tromas et al. Parameter Values Does Not Match the Data
To estimate basic parameters determining kinetics of TEV spread in tobacco plants, Tromas et al. [14] developed a mathematical model assuming that virus infection proceeds locally in each leaf and spreads from lower to upper leaves ( Figure 1A). Via several model iterations, the model in which within-leaf virus spread was leaf number-independent but the virus transport to upper leaves from the lower leaves was leaf number-dependent, fitted the data with best quality [14].
To verify these results we simulated virus spread dynamics using Tromas et al. [14] published model equations (Equations (1)-(3)) and parameter values (Table 1) and compared model predictions with the data (provided by Tromas et al. [14]). Surprisingly, the model predictions did not match the average infection levels observed in the data (solid lines in Figure 3). While we did not fully know the exact reasons for this discrepancy, we found that if we were to shift the infection trajectories predicted by the model by 3 days, the model predictions matched the data relatively well (Supplemental Figure S2). We therefore hypothesize that when numerically solving the model, Tromas et al. [14] may have initiated the solver starting at day 3 post infection given that it is the first time point at which experimental measurements were taken. (It is typical to obtain model predictions for times as given in the data, and solvers in R or python typically take the first time point as the time at which initial conditions are provided and not at the time 0 as is often assumed in models.)   Table 1. Table 1. New parameter estimates for the basic mathematical model of virus spread in plants. We list the parameter estimates and 95% confidence intervals (CIs) of the basic mathematical model of viral spread in plants (Equations (1)-(3)) as provided by Tromas et al. [14] ("Original parameters (nll)") or by fitting the model to data in this work using using binomial distribution-based likelihood ("New parameters (nll)", Equation (29)) or using least squares with a logarithmic transform ("New parameters (log LS)", Equation (35)). Fits of the mathematical model for two sets of model parameters are given in Figures 3 and 4. Confidence intervals for best fit parameters were generated using bootstrap by resampling the data (for likelihood-based fits) or were provided by the routine minimize in from python library lmfit for least square-based fits (see Section 2 for more detail).   (29), [14]) or least squares for log-transformation of the data and model predictions (Equation (35)) to fit the basic mathematical model (Equations (1)-(3)) to the virus spread data for leaf 3 (A), leaf 5 (B), leaf 6 (C) and leaf 7 (D). Data on proportion of virus-infected cells are shown by markers and lines are the predictions of best fit models. Parameters for the model fit using likelihood and least squares with the log transform are given in Table 1 ("New parameters (nll)" and "New parameters (Log LS)" columns, respectively). In fitting the models using least squares for log-transformed data, the limit of detection (LOD) was LOD = 5.12 × 10 −4 .

Parameter
To check that the virus dissemination model of Tromas et al. [14] is consistent with experimental data we fitted the model to the data using binomial distribution-based likelihood (see Section 2 for more detail). Importantly, the model fitted the data visually with good quality (dashed lines in Figure 3) indicating consistency of the model with the data. Interestingly, while some model parameters, such as ψ k , varied little between the original and corrected values, others such as I 0 or χ k differed substantially (Table 1). While confidence intervals for newly estimated parameters of the Tromas et al. [14] model are a bit large, we found that there is large difference in AIC Lik for this model when used with previously published Tromas et al. [14] parameters and our new estimates (∆ Lik > 100). Thus, our analysis provided updated and correct estimates of parameters characterizing kinetics of TEV spread in tobacco plants in the Tromas et al. [14] model.

Fitting the Models Using Binomial Distribution-Based Likelihood or Normal Distribution-Based Likelihood (Least Squares) Delivers Similar Parameter Estimates
In their study, Tromas et al. [14] proposed the use of binomial distribution-based likelihood to fit the models to data. In this approach, the probability of a plant cell being infected was treated as a Bernoulli trial in which A total cells are sampled, and the number of infected cells V is determined. While it seemed reasonable, it was not fully justified why such a likelihood is a good choice. There may be several potential issues with it. First, because there was a large variability in the total number of cells recovered from different leaves (from minimal 4314 to maximal 32,168 protoplasts/leaf), the data are unbalanced. Sources of such variability, however, are not entirely clear and may be due to variation of leaf or cell sizes but also may be related to difficulty of isolating protoplasts from leaves [36]. Parameter estimates may be biased if the fit favors better description of the data with the larger number of isolated cells. Second, while the large number of cells isolated may indicate certainty in estimation of the frequency of infected cells in a sample, there is a great variability in frequency of infected cells in the same leaf number between individual plants (e.g., Figure 3D), and binomial distribution-based likelihood may not adequately take such variability into account. Third and finally, given that a relatively large number of cells was measured in each leaf (>10 3 ), the distribution of the fraction of infected cells per central limit theorem may approach normal distribution, and therefore, one could use a normal distribution-based likelihood (least squares) for fitting models to data. Therefore, we fitted the Tromas et al. [14] model (Equations (1)- (3)) to the data using several different versions of least squares (see Equations (34) and (35) and Section 2 for more detail). Surprisingly, independent of the method used, the model predictions of the binomial distribution-based fits or least squares fits were nearly identical (e.g., Figure 4) and with a minimal, statistically non-significant difference in the parameter estimates for both fits (Table 1). Therefore, this result suggests that it may be reasonable to use least squares (or more generally, normal distribution-based likelihood) to fit virus dissemination models to these data. We did, however, find that not all least squares-based methods were appropriate. In particular, least squares with the frequency of infected cells resulted in skewed, nonnormally distributed residuals (Shapiro-Wilk test, W = 0.785, p = 1.887 × 10 −9 ). Some of the traditional approaches, for example the arcsin( √ x) transformation for the frequency of infected cells did not normalize the residuals (W = 0.803, p = 5.745 × 10 −9 ), however, log-transformation in which zero values were replaced with the limit of detection (LOD, see Section 2 for more detail) nearly did (W = 0.963, p = 0.021). Therefore, this analysis suggests that log-transformation of the data and model predictions is a viable alternative to the binomial distribution-based likelihood method of Tromas et al. [14] that may better account for variability in the frequency of infected cells between individual plants.

An Alternative Model with Variable Within-Leaf Replication Kinetics Is Consistent with Observed Viral Spread Kinetics
In their analysis Tromas et al. [14] investigated which parameters of the virus dissemination model may vary with the leaf number. By comparing alternative models they found that ψ and χ must be leaf-dependent to explain the data accurately. However, in that analysis they did not investigate if differences in virus dissemination may be due to variable within-leaf replication kinetics, determined by the parameter β, and not due to virus dissemination rate between leaves χ. Interestingly, we found that the alternative model (based on Equations (1)-(3)) in which β k and ψ k vary with the leaf number k (i.e., virus dynamics in a given leaf is determined mainly by the local spread in the leaf) while systemic dissemination of the virus to upper leaves is constant (χ k = χ) fitted the data with similar quality (as judged by SSR or AIC) as the original model. This alternative model has an extra parameter because of four β for four leaves studied while in the original model χ was defined for three leaves only. Yet, this result already suggested that the data on variable virus accumulation in different leaves can be explained equally well by differences in how much virus is delivered to upper leaves (χ k ) or by differences in how the virus replicates and spreads in individual leaves (β k ).

Alternative Models with Differing Patterns of Viral Dissemination Are Largely Consistent with Observed Viral Spread Kinetics
We next questioned whether a specific pattern of virus dissemination from the inoculated leaf 3 to the upper leaves can be determined from these experimental data. While there is a general understanding of how viruses in plants disseminate after a local infection (e.g., [28]) details of the dissemination may vary by the plant species, age, conditions in which the plant was grown, the virus species, inoculation method, and many other details. For example, the time when individual leaves become sources or sinks for sugar transportwhich will influence virus dissemination pathways-depends on many environmental and developmental factors [27]. Because many of these details are unknown for a specific experimental set-up, we investigated if the information provided by the experimental data on the fraction of infected cells in individual leaves over time is sufficient to establish a pattern for systemic viral dissemination.
Therefore, we developed a series of alternative mathematical models in which the pattern of virus dissemination differed in multiple ways from the original dissemination model of Tromas et al. [14] (Figure 1B-D and Equations (4)- (17)) and fitted these models to data. For example, alternative model 1 assumed that virus dissemination to upper leaves occurs only from the leaf below it, i.e., from leaf 3 to leaf 5, and then from leaf 5 to leaf 6 and so on ( Figure 1B). Alternative model 7 assumed that even though virus inoculation occurred at leaf 3, via access to vasculature, the virus immediately disseminated to leaf 7, and then spread to lower leaves ( Figure 1D and see Section 2 for details for other models). Some of these alternative models should not be necessarily considered as inappropriate because, for example, at day 3 after infection, leaf 6 on average had already nearly twice the frequency of infected cells as leaf 3 (0.014 vs. 0.009).
Finding the best fit model depended strongly on the statistical method used for fitting models to data. For example, using binomial distribution-based likelihood method suggested that best fit is provided by the alternative model 2 with the Tromas et al. [14] model fitting the data significantly worse (Supplemental Table S1, ∆ AIC = 290). We hypothesize that this result arose because of the high sensitivity of such a likelihood function to the experimental measurements, especially at the low frequency of infected cells. In contrast, fitting the models to data using least squares (Equation (34)) provided fits of all models with identical quality. This result was driven by the need of the models to more accurately fit the data with high frequency of infected cells in leaves 6 and 7 at later time points, at the expense of poorer fits of other data. These fits, however, were not adequate due to non-normally distributed residuals as was observed when fitting Tromas et al. [14] model to data (see above). Finally, fitting the models to log-transformed data (and replacing the zero values with the LOD) provided a more graded classification of alternative models ( Table 2). In particular, three models (original and alternative models 1&2) assuming that virus dissemination starts from leaf 3 provided better fits (based on AIC) than the models assuming that spread starts from upper leaves (e.g., alternative model 7). Interestingly, the quality of the model fits deteriorated as the models assumed virus dissemination did not originate from leaf 3-i.e., the models in which dissemination started at leaf 5 or 6 fitted the data with better quality than the model in which dissemination started at leaf 7 (Table 2 and Supplemental Figure S3). This result suggests that the data on virus dissemination does contain the signal indicating the virus most likely starts spreading from leaf 3 upwards; however, the strength of such a statement from our mathematical modeling-based analysis is relatively weak. Thus, these experimental data do not provide strong evidence for a specific route of TEV dissemination in tobacco plants. There is some good news, however. Some parameters appear to be robustly estimated in all the models such as β and ψ k ; that is perhaps unsurprising given that these parameters determine within-leaf viral spread.

Alternative Models Incorporating Independent Replication or Immune Responses Are Also Consistent with Observed Viral Spread Kinetics
We tested two additional alternative models for how well they may describe the data. Alternative model 8 assumed that upon virus inoculation, virus disseminates to all leaves and then replicates in individual leaves independently of other leaves, as described by the logistic equation (Equation (16)). Alternative model 9 assumed that reduction in the fraction of susceptible cells in a leaf is not determined by the fraction of infected cells but by the time since infection (Equation (17)). The rationale for this modification is that it is possible that infection induces generation of a local or systemic immune response after a delay T k which renders uninfected cells resistant to infection [32]. Both of these alternative models fitted the data well based on SSR or AIC metrics (Supplemental Table S2). Interestingly, the time-dependent cell susceptibility model suggested that differences in how quickly cells become resistant is leaf-dependent (Supplemental Table S2); however, this could be due to differences in the timing of initiation of immune responses and/or virus"arrival" in a given leaf (determined by T k ) or the speed at which uninfected cells in the leaf are rendered resistant (determined by n k ). Taken together, our results strongly suggest that multiple pathways of TEV dissemination and growth in individual leaves in the tobacco plants are consistent with the data and additional experiments and/or data need to be involved to eliminate unreasonable models [37]. Table 2. Several alternative models provide similar fits of the virus spread data with different parameter sets. We fitted seven alternative models for virus spread kinetics (given in Equations (4)- (15)) to the data on viral spread in plants using least squares with a logarithmic transform (see Equation (35)). Along with parameter values for every model we provide the total error (SSR Log ), AIC LS Log , and ∆ AIC (difference in AIC between the model with the lowest AIC and all other models). We also show the results of the Shapiro-Wilk normality test (W and p value) applied to the residuals of the fitted models. N/A stands for "Not Available" for parameters that were not present in a given alternative model. Coloring indicates the difference in the quality of a model fit as compared to the best fit models (green-better fit, red-worse fit).

Odds Ratio Test Implies a Higher than Random Rate of Coinfection
Our modeling-based analysis so far and that of Tromas et al. [14] treated cells in our data as infected or uninfected. However, in their experiments Tromas et al. [14] measured the fraction of cells infected with either or both of two viral strains of TEV, Venus or BFP (see Section 2 for more detail). Virus coinfection may impact many facets of viral dynamics and growth. A paramount consequence of two or more virions infecting the same cell simultaneously is that it may result in production of recombinant variants, which has been well documented for human immunodeficiency virus (HIV) [38,39]. In particular, in acute HIV infection, variants representing recombinants of infecting/founding strains, arose rapidly within a few months; interestingly, a simple mathematical model predicted that accumulation of the variants can be simply due to random coinfection of the susceptible cells by two viral variants [40]. Dang et al. [33] investigated whether infection of CD4 T cells in culture occurs randomly by two different HIV variants, HIV-eGFP and HIV-IHSA. The authors proposed an odds ratio (OR) metric to estimate deviation of the rate of cell coinfection with two viruses as compared to single infections (Equation (41)). Interestingly, in all their experiments with 2 HIV strains and different types of target T cells OR > 1 (typically, OR = 2-8), suggesting that coinfections were observed more often than single infections [33]. The authors explained this result by variability in CD4 T cell susceptibility to infection with susceptible cells being more easily infected with the two variants. A similar result was found later in another study [41]. Given our rich dataset on the dynamics of coinfection of plant cells with two variants of TEV we calculated the OR (Equation (41)) for every leaf and every time point in our data.
Interestingly, we found very high values for OR for most of the data, all exceeding one, with many values being in the range 10-100 ( Figure 5). Note that in some cases, mostly for leaf 5, we could not calculate OR due to absence of coinfected cells ( Figure 5B). OR of 10 to 100 is much higher than that found previously for HIV [33]. There may be several reasons for that. First, it is possible that there is a high degree of variability in susceptibility of different plant cells to infection, and cells that are highly susceptible get infected with both variants easily. We also found that there is a significant decline in OR with time of infection for all but leaf 5; this decline is consistent with the hypothesis that initially highly susceptible cells are infected resulting in high OR which declines as more resistant cells are infected ( Figure 5). Alternatively, the mode of virus transmission within the leaf may have played a major role. Indeed, in plants viruses are transmitted from the infected cell to adjacent cells via plasmodesmata, and if a cell is coinfected with two variants, it is possible that all new infections occur by both variants simultaneously [35]. Finally, if infection of cells occurs sequentially, infection with one variant may suppress any potential antiviral activity in the cell, allowing that cell to be coinfected with another variant [42][43][44]. To further test these hypotheses we used mathematical modeling.

A Probability-Based Coinfection Model Performs Best Compared to Other Coinfection Models
Given that many alternative mathematical models are consistent with the pathway of systemic virus dissemination ( Table 2) to investigate potential mechanisms of TEV coinfection dynamics in different leaves we decided to fix the details of virus dissemination between leaves to those provided in the previous study [14], i.e., we let the virus infection to be initiated in the leaf 3 and dissemination to upper leaves to depend on the infection frequency of leaves below ( Figure 1A and Equations (1)-(3)). To describe how coinfected cells are generated we developed six alternative "coinfection" models (see Figure 2 for 2 examples, Equations (21)- (27), and Section 2 for more detail). In the first, 1-alpha coinfection model, dynamics of coinfected cells are driven only by the within-leaf frequency of cells infected with either of two variants with the parameter α determining deviations of the coinfection from random (Figure 2A and Equation (21)). A simple extension of this model was to allow for different efficacies of coinfection depending of which virus in-fected the susceptible cell first (Equation (22)). Two other models assumed that coinfection may happen via two different pathways: local, within-leaf infection dependent on the frequency of single-infected cells and uninfected cells and via between leaf virus dissemination, with either identical (α) or different (α 1 and α 2 ) weights for this coinfection processes ( Figure 2B and Equations (24) and (25)). Finally, the third set of two models assume that coinfection due to within-leaf dynamics occurs due to coinfected cells transmitting both viral variants to susceptible cells, and due to between-leaf dynamics occurs similarly as in the previous model. We similarly assume that these two processes may proceed with different deviations from a random process which is captured by parameters α 1 and α 2 (Equations (26) and (27)).
We fitted these models to experimental data using two alternative approaches, logtransformed least squares (with LOD replacements of zero values) and binomial distributionbased likelihood, both extended to account for singly and co-infected cells in each leaf (see Equations (33) and (37) in Section 2 for more detail). The 2-alpha Probabilistic model (Equation (22)) was the best performing model when fitted by either method (Table 3). Importantly, with both methods the basic models assuming that coinfections occur randomly, due to within-leaf coinfection of cells poorly described the data (Table 3 and Supplemental Figure S4). Table 3. The 2-alpha probabilistic model fits the coinfection data with best quality. We fitted a series of mathematical models (see Section 2 and Figure 2) that make different assumptions on how coinfection of individual cells with two different viruses occur to the data on viral spread. The models were fitted using the binomial distribution-based likelihood method (Equation (33)) or the least squares method with a log transform of the data (Equation (37)). AICs were calculated using Equations (38) and (40), for the likelihood and least squares methods respectively. Values for nll, AIC Lik , and AIC LS Log were rounded to the nearest whole number. ∆ AIC for both methods are calculated by taking the AIC score from the model and method in question and subtracting it from the lowest AIC in its corresponding row. Coloring follows the same principle as in Table 2. In fitting models using least squares to log-transformed data we used the following values for the limit of detection of the frequency of infected cells: LOD Venus = 8.43 × 10 −5 , LOD BFP = 3.26 × 10 −4 , and LOD Mixed = 3.2 × 10 −5 .

1-α Probabilistic 2-α Probabilistic 1-α Coinfection 2-α Coinfection 1-α Logistic 2-α Logistic Equation (24)
Equation ( We also fitted the models using the least squares method for raw, untransformed frequencies of infected cells, but these fits poorly described the dynamics of coinfected cells. We reasoned that this is because there are typically fewer coinfected cells than singleinfected cells, and this least squares method favored fitting the dynamics of single-infected cells with better quality (due to their higher abundance). In this specific case, a statistical model based on untransformed least squares does not appear to be adequate.
With both of the appropriate methods we found that the 2-alpha probabilistic model fits the data with best quality, and the next best, 1 alpha probabilistic model performed significantly worse (per AIC scores, Table 3). Indeed, the best fit model could very accurately describe the dynamics of single-and co-infected cells and predicted a more rapid increase in the coinfected cells for leaf 6 and 7 than that for single-infected cells ( Figure 6). Unfortunately, we found relatively wide confidence intervals for estimates of many of the model parameters except ψ k suggesting that the amount of data available was relatively low, and increasing the number of time points and/or plant repeats may have allowed for more precise estimates. We should note, however, that mean estimates for within-leaf infection rates β V and β B and between-leaf spread rates χ k were very similar to those found when fitting Tromas et al. [14] model to the data on infected cell dynamics (Table 1) lending some support that our model parameters are not unrealistic. Excitingly, we found that for within-leaf virus spread, coinfection rate was much higher than cell infection by single viruses (α 1 = 10.1) supporting our analysis using odds ratio ( Figure 5). The between-leaf coinfection rate was not different from the random model (α ≈ 1) suggesting that most coinfection events were driven by within-leaf dynamics and not due to transfer of viruses systemically. This, perhaps, makes sense because locally it is easier for one cell to be coinfected by 2 viruses while when viruses enter the leaf at random locations due to systemic dissemination, coinfection is expected to be rare.
Both probabilistic models assume that the dynamics of coinfection within the leaf depends on the product of frequency of cells infected with either of two viral variants and the frequency of uninfected cells in the leaf (Equation (25)). We found that removing S k term in these models resulted in significantly poorer fit of the data (see file AllModelsAnd-Methods.xlsx in https://github.com/Plant-Virus-Spread/Models-And-Tools/, accessed on 14 November 2021). Intuitively, the frequency of uninfected cells drives the dynamics of infection and when S k approaches 0, infection of the leaf mostly stops, thus overpredicting the data. However, when such a term is absent in the equation for coinfected cells M k , co-infection would proceed even when single infections stop. With this mechanistic/mathematical insight it was difficult to come up with a biological explanation for why coinfections are dependent on the frequency of uninfected cells. One possibility that infections stop not because the number of uninfected cells declines to zero, but because of leaf-specific immune response makes uninfected cells in the leaf resistant to infectionsimilar to the alternative model 9 for the dynamics of infected/uninfected cells that we considered earlier (Equation (17)).
We found it interesting that the model in which the frequency of coinfected cells due to within-leaf dynamics grows logistically (Equations (26) and (27)) could not well describe the data ( Table 3). The model underestimated the frequency of coinfected cells at early time points. This result argues that the high odds ratio for the coinfection of cells observed in our data is not likely to arise exclusively due to adjacent cells being coinfected with the two TEV variants at once. This model prediction can be tested experimentally, for example, by using microscopy and examining spatial distribution of foci of cells infected with individual viral variants or with both variants [45].

Dynamics of Coinfected Cells Compared to Singly-Infected Cells
While our analysis provided solid evidence that coinfection of plant cells by two TEV variants does not proceed randomly we sought to investigate how coinfection rate varies with the frequency of single-infected cells. Previous mathematical modeling-based work on HIV infection of target cells suggested that the frequency of doubly-infected cells should scale as square of the frequency of single-infected cells [46]. As far as we are aware such prediction has not been tested for plant-infecting viruses. For every leaf we therefore plotted the relationship between the frequency of coinfected cells versus the frequency of cells infected with Venus ( Figure 7A-D) or BFP ( Figure 7E-H) strains of TEV and compared these data with predictions of the two alternative probabilistic models. We also fitted a line to log-log transformed frequencies and estimated the slope n of the relationship (Figure 7). Several interesting results emerged.  (18), (19) and (25)) assumes that coinfection of individual cells by two different strains depends on the level of uninfected cells in the leaf (S k ) and that local coinfections in the leaf occur at different kinetics that coinfection between leaves ( Figure 2B). We fitted this model to the data on infection of cells by either individual viruses or coinfection of the same cell by different viruses. The model was fitted using the binomial distribution-based likelihood method (Equation (29) First, we found that the relationship between frequency of coinfected cells and cells infected with a single virus is either sub-linear or linear for lower leaves (leaves 3 and 5, respectively, Figure 7A,B,E,F). This is not fully consistent with the results found using odds ratio ( Figure 5A,B) suggesting that different ways of data analysis may result in different conclusions. However, for upper leaves we found strong deviation from the linear relationship whereby coinfection frequency increased more rapidly than linearly with increasing frequency of single-infected cells (n > 1, Figure 7C,D,G,H). This is consistent with what we found using odds ratio ( Figure 5C,D). Predictions of our best fit 2-alpha probability model were mostly consistent with the data except for the leaves 6-7 and cells, singly infected with BFP variant (Figure 7G,H). Finally, we noticed that at later time points (∼ 7-10 days post infections), all of the curves in Figure 7 approximate lines. To understand why this occurs, and what the slopes of these lines are, we performed additional analyses (shown in Supplemental Information).  (25)) and dash-dotted lines are the predictions of 1-alpha probabilistic model (Equation (24)). The red dotted lines are power functions fitted to the data. The exponents n of these functions are shown in the top right corners below the leaf number. Asymptotic relationship between the frequency of coinfected and single infected cells appears to become a straight line for all three models, a feature which we examine in Equations (S41)-(S48) in Supplemental information. We also provide a python-based script in which these data and model predictions can be explored further (Supplemental Figure S5).

Discussion
In this paper we performed extensive analyses of the published data on the kinetics of infection of tobacco plants with two variants of TEV [14]. We found that the pathway of virus dissemination in the plant could not be robustly determined directly from the data on the change in frequency of infected cells in different leaves over time-several alternative models that assumed slightly different pathways of dissemination fitted the data with very similar quality. The model assuming that viral dissemination starts from the upper leaf 7, however, fitted the data poorer than the models assuming that dissemination starts with lower (3rd) leaf suggesting that these data do contain some information on the direction of virus dissemination. The best performing model in our analysis was dependent on the method of how the models were fitted to data; fitting the models using binomial distribution-based likelihood (Equation (29)) suggested that alternative model 2 (Equation (6)) was the best (Supplemental Table S1). On the other hand, when total infection models were fitted using the least squares method based on log-transformed data and model predictions (Equation (35)), Tromas et al. [14] model and alternative models 1&2 (Equations (4) and (6)) fitted the data with the best quality ( Table 2). The way experimental measurement errors influence the data remains poorly understood, and therefore, which statistical model-logtransformed least squares or binomial distribution-based likelihood-are more appropriate in fitting the models to such data remains undefined. The way forward is to understand better sources of errors in experiments measuring the fraction of infected protoplasts by flow cytometry.
It is generally unknown why not all cells in the leaves were infected 10 days post infection; for example, leaf 3 had less than 10% of its cells infected by the end of experiment ( Figure 3). One possibility is that not enough time has passed for all cells to be infected. Tracking virus infection at longer than 5.5 week periods may be complicated because at this time plant physiology changes dramatically due to development of flowers. Tromas et al. [14] assumed that infection stops after the fraction of infected cells in a leaf exceeds some critical value ψ k , but how the physiological aspects of the plant, or the virus infecting it, determine the value of ψ k remain a mystery. We showed that an alternative model in which infection of a given leaf slows down due to a time-dependent factor and not directly due to increase in the fraction of infected cells, can describe the data with similar quality (Supplemental Table S2). Such time-dependent factors may be an immune response such as RNAi generation and dissemination via plasmodesmata that may render cells in the leaf resistant to infection. Another factor could be changes to plasmodesmata themselves, like the accumulation of callose at the pores, that prevent the local cell-to-cell movement of the virus [47]. Physiological changes in the leaves in a growing plant may also contribute to the increased resistance of some plant cells to infection.
Our main findings, however, are about coinfection of cells with two different variants of TEV. Interestingly, by using odd ratio metrics [33] we found significantly a higher frequency of coinfections of leaf cells by two viruses, in some cases with OR = 100 or more that is much higher than that observed in other systems [33] and is in contrast with another study finding suppression of coinfections [45]. Importantly, we developed a series of novel mathematical models that track the coinfection dynamics; the best fit model also predicted higher rates of coinfections of plant cells with two viruses for the within-leaf virus spread but not for virus dissemination to other leaves ( Figure 6). Additional analysis showed that at least for the upper leaves (leaf 6&7) the frequency of coinfected cells increases more rapidly than linear with frequency of single-infected cells (Figure 7), and we show analytically that this is not expected in the random infection model. It has been proposed that deviation of coinfection frequency from random is likely to result from heterogeneity in target cell susceptibility to infection [33]. However, given the mechanics of virus spread in plants via plasmodesmata, ability of multiple viruses to enter the same cell, and thus increase chances of coinfection, remains a possibility (this mechanism did not fit the data with best quality, see Table 3) [48,49]. Given that virus coinfection of leaf cells in other systems can be high and that virus coinfections may result in higher virus production by infected cells [50,51], impact of coinfections on virus evolution has received considerable attention [52][53][54].
As far as we are aware, Tromas et al. [14] performed the first comprehensive analyses of virus dissemination in plants, and so far, no similar works (experiments and modeling) on virus dissemination within and between multiple tissues have not been performed in animals. However, several studies have investigated how, for example, hepatitis C virus (HCV) spreads locally in the liver [55][56][57]. There is also evidence for local spread of influence A virus in humans and animals (reviewed in [58]), and mathematical models that take into account physiology of the lung tissue to study virus spread have been proposed [59]. Our observation of potential cooperativity between viruses infecting individual cells extends the results found with animal viruses such as HIV or vaccinia virus [33,46,60,61]. Our analyses thus illustrate that additional insights can be generated by experiments in which infection accumulation (and loss) are tracked over time systematically in the whole organism; using barcoded viruses may be particularly useful in this regard [62].
Our study has several limitations. In our analysis we ignored the complexity of the growing, 4 week old tobacco plants, and changes that occur with leaves in the growing plant. Plants do not have pumping systems like animals, and therefore systemic movement of viruses must follow the already established pathways provided by the phloem, typically, from source to sink tissues such as developing leaves. However, it is not always obvious based on visual appearance when a given leaf changes from being a sink to being a source (or vice versa). Viruses can manipulate source-sink relationships in their hosts; e.g., some viruses can convert source tissues into sinks [27,63]. While we had information on the fraction of infected cells in different leaves, spatial aspects of the infection process were lost during protoplast extraction. Better understanding of virus dissemination kinetics is likely to benefit when such spatial details are also recorded, along with the high throughput flow cytometry-based measurements.
While we provided evidence that coinfection occurs at higher frequency than predicted by the random infection hypothesis, we were unable to provide a solid explanation for this effect. Variability in susceptibility of cells to viral infection, local, cell-to-cell virus transmission via plasmodesmata, or cooperation between viral variants may be contributing.
We showed that inference of the best fit model depends on the method used to fit models to data. Given limited understanding of the sources of errors in these data, the most appropriate statistical models that take into account measurement errors will need to be developed. In particular, by fitting the models to data using binomial distributionbased likelihood we found large differences in quality of how alternative models fitted the data (based on AIC values). We hypothesize that binomial distribution-based likelihood amplifies small differences in the infection frequency of individual leaves at early time points, leading to significant favoring of one model over the other. However, this method does not truly account for experimental noise in extraction efficiency of protoplasts from the leaves and false positives when detecting fluorescence signals from individual cells by flow cytometry. Therefore, we believe that the finding that there is one best fit model among the alternative models when models are fitted using binomial distribution-based likelihood is insufficient to choose a specific model. Additional experiments that better address experimental errors in measuring the fraction of infected cells in different leaves will be needed to derive a better statistical model to fit our dynamical models to such data.
Similarly to Tromas et al. [14] we ignored the fact that infection occurs in a plant, and pooled all infection-per-leaf data together without tracking infection per plant. It is clear, however, that some plants may have more infection in all leaves than others (e.g., Supplemental Figure S1) and fitting the models to such "paired" data may provide additional insights into details of viral spread locally and systemically. Finally, we showed that the pathway of virus dissemination in plants cannot be easily determined from experiments that measured virus accumulation in different leaves over time, although this result was dependent on the way the models were fitted to data.
Biases introduced by extraction of protoplasts for use with flow cytometry, remain unclear. For example, infected cells may preferentially die during the extraction process which would reduce the fraction of infected cells measured. In immunology, one potential way to understand such biases has been by comparing the flow cytometry-based measurements with microscopy-based measurements [64,65].

Conclusions and Future Directions
Our study opens avenues for future research. In particular, similar analyses may need to be performed for other plant viruses. TEV is a potyvirus, one of the largest classes of viruses in plants, [66], and together with the geminiviruses, they are responsible for the majority of disease in commercial agriculture. Understanding how these viruses disseminate in their hosts may bring practical benefits through improved interventions (e.g., [67]). To understand better details of local virus dissemination it will be necessary to combine measurements of spatial virus spread in individual leaves with flow cytometrybased measurements of the fraction of infected cells. Local virus spread can be measured by confocal microscopy and larger spread by light microscopy [26,45,68,69]; previous studies have developed frameworks of how such local viral spread may be modeled [57]. Future studies should better understand why infection of a given leaf stops when not all cells are infected. Whether this is related to changes in leaf physiology (moving from sink to source) or immune responses in the leaf or systemically needs to be tested in experiments and modeled appropriately. Whether measurement of infection in leaves is sufficient to accurately predict virus dissemination kinetics is unclear. For example, roots are typical sink tissues in plants [27]. Thus, it is likely that virus accumulation in the roots precedes or coincides with systemic virus dissemination to upper above-ground structures. Future experiments and modeling studies may benefit to include the dynamics of virus-infected cells in the plant roots. Finally, more precise understanding of the pathway of virus dissemination will benefit from additional data in which infection is initiated in different leaves and experiments in which some leaves are removed after a specific time period (e.g., [28]). Such experiments are not without caveats because removing a leaf may induce systemic changes in the plant that in turn may influence virus dissemination kinetics. Therefore, such experiments would be helped by mathematical models that can make quantitative predictions on the impact of different leaf removal on the virus dissemination kinetics, and these models can be tested and some falsified [37]. Ultimately, a combination of well designed experiments to test specific hypotheses and quantitative mathematical models is likely to bring novel insights into how viruses disseminate in their plant hosts. Such knowledge will be critical for development of novel strategies for limiting agricultural losses to viruses.

Data Availability Statement:
The data for our analyses have been provided by S. Elena [14]. Formatted data are available with this publication as a supplement (csv file) and via Github (https: //github.com/Plant-Virus-Spread/Models-And-Tools/ (accessed on 14 November 2021)). We performed most of our analyses in python and provide several key codes to ensure reproducibility of our results on github (https://github.com/Plant-Virus-Spread/Models-And-Tools/ (accessed on 14 November 2021)). Specifically, we provide the codes to (1) plot the predictions of Tromas et al. [14] model with their parameter estimates, (2) fitting [14] model to the data using either binomial distribution-based likelihood or least squares, (3) fitting the 2-alpha probabilistic model to the coinfection data using binomial distribution-based likelihood, and (4) sliders code allowing to explore the relationship between coinfected and singly-infected cells (in the data and predictions of 2-alpha probabilistic model). Tobacco etch virus BFP blue fluorescent protein nll negative log-likelihood LS least squares LOD limit of detection AIC Akaike Information Criterion SSR sum of squared residuals OR odds ratio