Predicting the Trajectory of Replacements of SARS-CoV-2 Variants Using Relative Reproduction Numbers

New variants of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with high effective reproduction numbers are continuously being selected by natural selection. To establish effective control measures for new variants, it is crucial to know their transmissibility and replacement trajectory in advance. In this paper, we conduct retrospective prediction tests for the variant replacement from Alpha to Delta in England, using the relative reproduction numbers of Delta with respect to Alpha estimated from partial observations. We found that once Delta’s relative frequency reached 0.15, the date when the relative frequency of Delta would reach 0.90 was predicted with maximum absolute prediction errors of three days. This means that the time course of the variant replacement could be accurately predicted from early observations. Together with the estimated relative reproduction number of a new variant with respect to old variants, the predicted replacement timing will be crucial information for planning control strategies against the new variant.


Introduction
Since its first emergence in the human population in 2019, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been generating new variants. Natural selection selects new variants that have higher effective reproduction numbers than other circulating variants. As a result, the average transmissibility in the viral population increases over time [1]. The emergence and replacement among variants of concern (VOCs), Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1), Delta (B.1.617.2), and Omicron (B.1.1.529) [2] are the process of natural selection.
It is important to know the transmissibility of new variants in comparison with previously circulating variants because the average reproduction number of the circulating virus changes when new variants become dominant. Several studies have analyzed the reproduction numbers of new variants that have replaced old ones. Volz et al., estimated the effective reproduction number of Alpha in England to be 1.5-2.0 times higher than that of non-VOCs using a logistic growth model for relative variant frequencies [3]. Leung et al., estimated the basic reproduction number of Alpha to be 1.75 times higher than that of previously circulating variants in England using a renewal-equation-based model [4]. Ito et al., estimated the effective reproduction number of Delta to be 1.35 times higher than that of Alpha from relative variant frequencies observed in Japan by using an approximated version of the renewal-equation-based model [5]. Using the same method, Ito et al., estimated the effective reproduction number of Omicron to be 3.15 times higher than that of Delta in Denmark [6], and Nishiura et al., estimated the effective reproduction of Omicron to be 4.2 times higher than that of Delta in South Africa [7].
In order to prepare control measures against new variants, it is crucial to predict the trajectory of the variant replacements in advance. The prediction of variant selection has been widely studied in seasonal influenza viruses [8]. Łuksza and Lässig developed a fitness model using mutations on epitopes and non-epitopes to predict selected variants [9]. Huddleston et al., predicted the future relative frequency of variants using its current relative frequency, the antigenic novelty of epitopes, and the mutational load in nonepitopes [10]. Piantham and Ito modeled the fixation probability of variants using relative variant frequency and statistics on patient ages [11]. In the case of seasonal influenza, the main driving force of natural selection was the population immunity acquired from previous infections. In contrast, most of the human population were considered naïve to SARS-CoV-2 at the beginning of the pandemic, and a method to predict the trajectory of variant replacements in the early stage of the pandemic can be simpler than those assuming pre-existing immunity from previous infections.
The transmissibility of an infectious agent can be measured by its reproduction number. The effective reproduction number at time t (R t ) is defined as the average number of people someone infected at time t could expected to produce if conditions should remain unchanged [12]. When more than one variant of the infectious agent is circulating, the relative reproduction number can be used to measure the relative transmissibility of a variant compared to a baseline variant [4,13]. However, the method requires the numbers of new cases in addition to the relative frequencies of variants, and it is not applicable for predicting variant replacement in the future. Using approximations, Ito et al., proposed a method to determine the relative reproduction number without knowing the number of new cases [5]. This method allows us to predict the future trajectory of variant replacements.
Nucleotide sequences of SARS-CoV-2 variants have been collected worldwide and accumulated in the GISAID database [14]. It is known that different geographical locations have different distributions of variants [15]. As of 28 September 2022, a total of 13,283,666 sequences have been registered on the database worldwide. Of these, 2,286,890 (17.2%) were submitted from England, which has their population account for 0.71% of the world population. These numbers indicate that England is one of the locations with the highest sequencing capacity. In England, the Alpha-Delta replacement was observed from March 2021 to June 2021. The sequence information during the Alpha-Delta replacement in England is one of the best datasets to evaluate the predictability of variant replacement in SARS-CoV-2.
In this study, we conduct retrospective prediction tests using the nucleotide sequences collected in England during the Alpha-Delta replacement. For each given time point, we use partial sequence data observed only up to that time point to estimate the relative reproduction number of Delta with respect to (w.r.t.) Alpha. The estimated relative reproduction number is then used to predict the future trajectory of variant replacement. The estimated relative reproduction numbers and the predicted trajectories are evaluated by being compared to those estimated using the entire dataset.

Nucleotide Sequences
Nucleotide sequences of SARS-CoV-2 viruses collected from England from 1 January 2021 to 31 July 2021 were downloaded from the GISAID database on 16 November 2021. Of these, 411,123 sequences had complete information about the date of sample collection in the metadata. The PANGO lineage names [16] of these sequences were collected from metadata and recorded with their collection dates (Table S1). Sequences that are labeled as "B.1.1.7" or sublineage names starting with "Q." were classified as the Alpha variant. Sequences that are labeled as "B.1.617.2" or sublineage names starting with "AY." were classified as the Delta variant. There were 11,773 sequences (2.9%) of lineages other than Alpha and Delta, and these were ignored in subsequent analyses. A total of 399,350 sequences of Alpha (192,250) and Delta (207,100) were used for counting the daily numbers of sequences belonging to Alpha and Delta ( Figure S1).

Model of Advantageous Selection
We estimated the relative reproduction number of a variant w.r.t. a baseline variant using an approximated version of the renewal-equation-based model [5]. Let X and Y represent variants circulating in the population and q X (t) and q Y (t) denote relative frequencies of X and Y at calendar time t, respectively. Suppose that variant X was dominant at t 0 and variant Y was introduced into the population at that time with an initial relative frequency of q Y (t 0 ). We assume that the effective reproduction number of variant Y was k times higher than that of variant X and that k is constant over time. Let f (τ) be the probability density function of generation time τ for SARS-CoV-2 infections. We assume f (τ) follows the gamma distribution with a shape parameter of 3.42 and a scale parameter of 1.36 [17]. We discretize f (τ) to g(j) = j j−1 f (τ)dτ for 1 ≤ j ≤ 19. We truncate the generation time distributions at τ > 20 and set g (20) = ∞ 19 f (τ)dτ so that ∑ 20 j=1 g(j) = 1. Let I(t) be the total number of new infections by either X or Y at calendar time t. Based on Fraser's time-since-infection model [12], the effective reproduction numbers of variant X and Y can be calculated as and Since the effective reproduction number of variant Y is k times higher than that of variant X, the effective reproduction number of variant Y at time t is given by Assuming that the viral population at time t comprises only variants X and Y, the relative frequency of variant Y at calendar time t, q Y (t), can be calculated as We assume that the numbers of new infections do not vary greatly for 20 days, i.e., For t > t 0 . In our previous publication using SARS-CoV-2 data from Denmark, we compared models using Equation (5) with models not using Equation (5). As a result, models using approximation with Equation (5) had lower AIC than their corresponding models without the approximation, suggesting that approximation using Equation (5) gives a better model than that without Equation (5) by eliminating noise in observed I(t) [18]. Using this approximation with Equations (1)-(3), we can rewrite Equation The average reproduction number of circulating viruses can be determined by the expected value of the reproduction numbers of circulating variants. Since the relative reproduction number of X is 1, and that of Y is k, the average relative reproduction number of circulating viruses at time t w.r.t. variant X is given by

Parameter Estimation from the Number of Sequences
Let N X (t) and N Y (t) be the number of sequences of variant X and Y observed at calendar time t, respectively. Suppose that variant Y is sampled and sequenced following a betabinomial distribution having distribution parameters of α = q Y (t)M and β = (1 − q Y (t))M, where M = α + β. The parameter M represents the sum of the two shape parameters of the underlying beta distribution and it determines how proportions of variants vary during sampling. Note that this beta-binomial distribution has a mean of (N X (t) + N Y (t))q Y (t) and a variance of . The beta-binomial distribution becomes the binomial distribution when M = ∞. To reduce computational time, the upper limit of M is set to 2000. When q Y (t) = 0.5 and M = 2000, the first and third quartiles of the beta distribution are 0.492 and 0.508, respectively. The following equation gives the likelihood function of parameters k, q Y (t 0 ), and M for observing N X (t) and N Y (t) sequences of variants X and Y at calendar time t: . The likelihood for observing N Y (t) sequences of variant Y during the period on calendar times t 1 , . . . ,t n is given by the product of the above formula for 1 ≤ t ≤ n.
We consider Alpha and Delta as variants X and Y, respectively. N X (t) and N Y (t) are the numbers of sequences of Alpha and Delta in England at calendar time t, respectively. The date of the first introduction of Delta, t 0 , was set to the first date when N y (t) > 1 (18 March 2022). The estimates of k, q Y (t 0 ), M, and q Y (t) were obtained by maximizing the likelihood function from t = t 0 until the latest t in which q Y (t) < 1 (4 July 2021). We used the Sbplx algorithm in the NLopt library to determine the maximum likelihood [19,20]. The 95% confidence intervals (95% CI) of k, q Y (t 0 ), and M were determined using the profile likelihood method [21]. Augmented Lagrangian algorithm [22] in the NLopt library was used to determine 95% CIs using Sbplx as the subsidiary optimization algorithm. From the maximum likelihood estimates of k and q Y (t), the average relative reproduction number of circulating viruses w.r.t. Alpha at time t was estimated from Equation (7). The 95% CIs of q Y (t) and the average relative reproduction number of circulating viruses w.r.t. Alpha at time t were determined using combinations of parameters within 95% confidence region [21].

Prediction of Relative Variant Frequency and Average Relative Reproduction Number
Relative frequencies of Delta and average relative reproduction numbers of circulating viruses w.r.t. Alpha in the future were predicted using the maximum likelihood estimates of parameters calculated from early observations. For each proportion p = 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, and 0.95 we determined the calendar times s p when the estimated relative frequency q Y s p exceeded p using the maximum likelihood estimates calculated with the entire observations from 18 March to 4 July 2021. For each date s p determined above, we calculated the maximum likelihood estimates of k, q Y (t 0 ), and M using observations no later than s p . Relative frequencies of Delta and average relative reproduction numbers of circulating viruses w.r.t. Alpha in the future were predicted by substituting k and q Y (t 0 ) in Equations (6) and (7), respectively. The 95% CIs of q Y (t) and the average relative reproduction number of circulating viruses w.r.t. Alpha at time t > s p were determined using combinations of parameters within 95% confidence region estimated from observations at time t ≤ s p .  Figure 1a shows the observed and estimated relative frequencies of Delta during the Alpha-Delta replacement in England. The blue curve and black curves around the blue curve represent the maximum likelihood estimates and 95% CI of relative frequencies of Delta. The gray area represents 95% equal-tailed intervals of the beta distribution with the parameters q Y (t)M and (1 − q Y (t))M. Figure 1b shows the maximum likelihood estimates and 95% CI of the average relative reproduction number of circulating viruses w.r.t. Alpha during the same period. Dashed vertical lines in both panels indicate the dates when relative frequencies of Delta exceeded each 0.05 increment from 0.05 to 0.95 (Table 2). It took 47 days for Delta to reach relative frequencies of 0.05 (21 April 2021) to 0.95 (7 June 2021).   Table 3 shows the parameters of our model estimated using the partial data collected no later than each of the dates in Table 2. The final estimate of k using observations of the entire period in the Alpha-Delta replacement was 1.88 (Table 1). The final estimate was within 95% CIs of estimations in seventeen out of nineteen estimations using the partial observations. Only the two early estimations, made at relative frequencies of 0.05 and 0.10, failed to cover the final estimate of k in their 95% CIs. All 95% CIs of k estimated at relative frequencies greater than or equal to 0.15 covered the final estimate of k. These results implied that it was possible to accurately estimate the relative reproduction number of Delta w.r.t. Alpha when relative frequencies of Delta became 0.15 or later. It took 38 days for Delta to reach a relative frequency of 0.95 (7 June 2021) from when it was 0.15 (30 April 2021) ( Table 2). Therefore, we would be able to estimate the relative reproduction number of Delta w.r.t. Alpha more than one month before its fixation.

Prediction of Relative Variant Frequency in Future
We conducted retrospective prediction tests on the future relative frequency of Delta and the average relative reproduction number of circulating viruses w.r.t. Alpha using model parameters in Table 3, which were estimated from partial observations. Figure 2 shows predicted trajectories of the Alpha-Delta replacement using partial observations up to different time points in Table 2. The maximum likelihood predictions made at relative frequencies of 0.05 and 0.10 overestimated the future relative frequencies of Delta (Figure 2a,b), while predictions made at relative frequencies greater than or equal to 0.15 fitted well with future observations (Figure 2c-i).
According to the final estimate using the entire observations, Delta exceeded relative frequencies of 0.50, 0.70, and 0.90 on 14 May, 20 May, and 1 June 2021, respectively ( Table 2). We evaluated the accuracy of predictions by analyzing predictions targeted on these dates ( Figure 3). When predictions were made before the relative frequencies of Delta reached 0.15, the relative frequencies of Delta on the target dates were overestimated (Figure 3a-c) and the dates predicted to exceed target relative frequencies were earlier than the final estimates (Figure 3d-f). The reason for these was that these early predictions overestimated the relative reproduction numbers of Delta w.r.t. Alpha (Table 3). In contrast, predictions made when relative frequencies of Delta were greater than or equal to 0.15 were close to the final estimate of relative frequencies (Figure 3a-c) and dates (Figure 3d-f). When relative frequencies of Delta were greater than or equal to 0.15, the predicted relative frequencies targeted on 14 May, 20 May, and 1 June 2021 had median errors of 0.060 (n = 7), 0.023 (n = 11), and 0.004 (n = 15) with maximum absolute errors of 0.092 (n = 7), 0.060 (n = 11), and 0.034 (n = 15), respectively (Table 4). With the same setting, the predicted dates exceeding targeted relative frequencies of 0.50, 0.70, and 0.90 had median errors of 1 (n = 7), 1 (n = 11), and 1 (n = 15) days with maximum absolute errors of 2 (n = 7), 2 (n = 11), 3 (n = 15) days, respectively (Table 5).

Discussion
We analyzed the replacement from the Alpha variant to the Delta variant in England using nucleotide sequences on the GISAID database collected from 18 March to 4 July 2021. The estimated relative reproduction number, k, of Delta w.r.t. Alpha was 1.88 (95% CI: 1.85-1.91) with a beta-binomial distribution parameter (M) of 288.54 (95% CI: 202.96-406.26) ( Table 1). The relative reproduction number of Delta w.r.t. Alpha was accurately estimated from early observations once the relative frequencies of Delta reached 0.15 (Table 3). Using these estimates of the relative reproduction number, the date when the relative frequency of Delta would reach 0.90 was predicted with a maximum absolute prediction error of three days (Table 5).
Several studies have estimated the relative reproduction of Delta w.r.t. Alpha in different countries. Ito et al., estimated the relative reproduction number of Delta w.r.t. Alpha in Japan to be 1.35 [5]. Hansen estimated the relative reproduction number of Delta w.r.t. Alpha in Denmark to be 2.17 [23]. In this study, the relative reproduction number of Delta w.r.t. Alpha was estimated to be 1.88 (95% CI: 1.85-1.91) ( Table 1). Figgins and Bedford found that the relative reproduction number of Delta and Alpha w.r.t. non-VOC variants in the United States were different depending on the states [24]. The differences in relative reproduction numbers of Delta w.r.t. Alpha among countries or states may be attributed to the differences in vaccine usage or the ethnicity of the target populations.
Our model assumes that the sequences on the GISAID database were sampled following a beta-binomial distribution. We can use the binomial distribution in the model instead of the beta-binomial distribution. The model using beta-binomial distribution resulted in lower Akaike information criteria (AIC) compared to the model using binomial distribution (Table S2). This means that the observed variance was larger than the variance of the binomial distribution. The additional variance to the binomial distribution may be attributed to the difference between relative variant frequencies among subpopulations, indicating that the target population was not well-mixed. For example, different regions may show different progresses in the variant replacement. The same may be true for different age groups.
It is possible to use a logistic regression model to fit the trajectory of the relative frequency of variants to achieve the same purpose as in this study. The renewal-equationbased model described in this study resulted in a lower AIC than using a multinomial logistic regression model [25] as shown in Table S3. This suggested that the renewalequation-based model is a better model than the logistic regression-based model for this dataset.
The lockdown restrictions in the UK were relaxed on 17 May 2021. The relative reproduction number estimated using data up to 28 May 2021 was slightly lower than the final estimation using entire observations (Table 3). This is attributed to the decrease in the relative frequency of Delta and the increase of the relative frequency of Alpha around 28 May 2021 (Figure 1). The underlying mechanism of the drop in the relative frequency of Delta after the relaxation of lockdown is unknown and needs to be further investigated.
Prediction tests conducted in this study used the date of sample collection and did not consider the delay in sequence submissions. Nucleotide sequences collected during the period from 18 March 2021, the introduction of Delta, to 13 May 2021, the date when Delta reached a relative frequency of 0.95, were submitted after 10.78 days on average with a standard deviation of 3.76 days. Indeed, 95% of sequences during this period were submitted to the GISAID database within 16 days after sample collection. This means real-time prediction may need additional 10-16 days to achieve a prediction accuracy similar to the results shown in this study.
Our model assumes that there was no difference between the generation times of both variants with a mean value of 4.64 days [17]. However, Hart et al., estimated the generation time of Delta (4.7 days) to be shorter than that of Alpha (5.5 days) [26]. To allow differences between generation times of variants, it is necessary to extend the model to also estimate the relative generation times of the variant w.r.t. that of the baseline variant [18].
During the period of analysis, the percentage of people receiving two-dose vaccinations in England was increasing from 7% on 1 April to 39% on 1 June 2021 [27]. Our model assumes that there was no difference between vaccination efficacy against Alpha and Delta. If there was a difference between vaccine efficacies against Alpha and Delta, the relative reproduction number of Delta w.r.t Alpha would change as population vaccine coverage increases. Detection of the difference in vaccine efficacy among variants may be possible by analyzing the temporal change of relative reproduction number using vaccination coverage data if there is a sufficient difference.

Conclusions
In this study, we showed that the relative reproduction number of Delta w.r.t. Alpha in England and its future trajectory of replacement could be predicted one month before it reached a relative frequency of 0.90. Public health policy-makers would have had only one month to prepare control measures for the increase in viral transmissibility. This implies that a quick decision-making process is needed to take advantage of the prediction. This period can be extended if accurate predictions are available using data earlier than one month. Further research is needed to investigate how much additional information was required for obtaining accurate predictions using data earlier than one month.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/v14112556/s1, Figure S1: Daily relative variant frequencies of Alpha (red), Delta (blue), and other variants (gray) in England from 1 January 2021 to 31 July 2021 calculated from nucleotide sequences on the GISAID database; Table S1: Metadata of nucleotide sequences of SARS-CoV-2 viruses collected from England during 1 January 2021 to 31 July 2021; Table S2: Parameters estimated by the binomial distribution model and comparison of AIC values with those of the beta-binomial distribution model; Table S3: Comparison between AIC values of the renewal-equation-based model and that of the logistic regression model estimated from entire observation.
Funding: This research was funded by the Japan Agency for Medical Research and Development, grant numbers JP20fk0108535 and JP21wm0125008. K.I. received funding from JSPS KAKENHI (21H03490). C.P. was supported by the World-leading Innovative and Smart Education Program (1801) from the Ministry of Education, Culture, Sports, Science, and Technology, Japan. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable. : Table S1 contains metadata of all nucleotide sequences of SARS-CoV-2 viruses collected from England from 1 January 2021 to 31 July 2021 used for the analysis.