# Virus Quasispecies Rarefaction: Subsampling with or without Replacement?

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

^{12}[4,5].

## 2. Methods

Concept | Definition |

Rarefaction | A technique used to compensate for different intensities of sampling in diversity studies. |

Subsampling cycle | It consists in the successive random extraction of a given number of items from a sample, lower than the sample size, with or without replacement at each extraction. |

Subsampling with replacement | This is based on a situation where an element is randomly extracted from a sample, identified, and then immediately replaced. Therefore, this element can be obtained again in further extractions along the same subsampling cycle. |

Subsampling without replacement | All extractions in a subsampling cycle are performed without replacement, so no item may be extracted multiple times in the same cycle. |

Downward bias | Inaccuracy in measurement or estimation that underestimates the true value. |

Subsampling fraction | Fraction of reads being subsampled from a given sample in a single resampling cycle. |

Granularity | Level of resolution at which the data are processed when estimating frequencies from counts. |

## 3. Results

- All singletons: This represents a quasispecies where all haplotypes are represented by a single read. It serves as the simplest case to numerically show the discussed limitations, showcasing the most significant differences between the sampling schemes.
- Single dominant case: This hypothetical scenario involves a dominant haplotype, while all other haplotypes are singletons. Our goal is to evaluate the master frequency and the number of haplotypes.
- Prominent haplotypes: In this case, there are six prominent haplotypes along with a set of singletons. The objective is to evaluate the frequencies of the prominent haplotypes, the fraction of singletons in the quasispecies, and the fraction of reads for haplotypes with over one read and below the top 6 haplotypes, representing singleton replicates produced in sampling with replacement.
- No rare haplotypes: This is a quasispecies composed of a master haplotype at 90%, with 10 other haplotypes at 1% each. This scenario excludes singletons and lower frequency haplotypes. We seek to estimate haplotype frequencies by repeated subsampling.
- Flat quasispecies: Similar to the first case, all the haplotypes have equal frequencies, ranging from 1 read to 10 reads each, representing a perfectly even quasispecies. This case is crucial for demonstrating the robustness in sampling quasispecies data that have undergone a previous abundance filter at a low level.

#### 3.1. Bootstrap: The Theory around 0.632

^{n}; this means that the probability to have a given item sampled in a bootstrap cycle is 1 − (1 − 1/n)

^{n}. As n tends to infinity, the limit of this expression is 1 − 1/e = 0.632. This result implies that a bootstrap resample is composed of 0.6321 unique realizations of items in the original sample plus 0.3679, which are replicates, in the limit as n grows to infinity.

#### 3.2. Subsampling a Given Fraction with Replacement

^{f·n}, and the limit as n tends to infinity is 1 − (1/e)

^{f}.

^{∞}to ∞ · 0, noting that f (x) = e

^{ln(f(x))}

#### 3.3. The All-Singletons Case

#### 3.4. The Single-Dominant Case

#### 3.5. Prominent Haplotypes

#### 3.6. No Rare Haplotypes

#### 3.7. Flat Quasispecies

^{(n·k)}= 1 − (1 − 1/n)

^{(n·k)}, where n · k is the sample size. The probability, which, in the limit as n goes to infinity, is 1 − (1/e)

^{k}. Table 11 and Figure 4 show the values computed for n = 1000 haplotypes, k from 1 to 10 reads each, the computed probability, and the corresponding limits.

_{2}[n|k,f]/E

_{1}[n|k,f] gives the fraction of haplotypes estimated in subsampling with replacement with respect to those estimated in subsampling without replacement (rarefaction). This ratio gives a representation of the accuracy obtained in subsampling with replacement in this scenario, and is represented in Figure 6, computed for n = 10,000 haplotypes, k = 1, 2, …, 10 reads per haplotype, and f = 0.1, 0.2, …, 1.

## 4. Discussion

^{q}D(p) (Equation (3)) [25,26], of different orders q, they will be limited above by

^{0}D, being the number of haplotypes, and below by

^{∞}D, being the inverse of the frequency of the dominant haplotype. As the order q increases, the relative weight of low frequency and rare haplotypes in the computation decreases, as low-frequency values are more heavily affected by the exponent. At q = 0, all haplotypes have equal weight regardless of their frequency, while at q = ∞, only the highest frequency holds significance. This observation suggests that the sensitivity or dependence of a Hill number with respect to sample size decreases as q gets bigger. Considering the correspondence between Hill numbers and other classical diversity indices, we may set the sensitivity order as:

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Holland, J.; Spindler, K.; Horodyski, F.; Grabau, E.; Nichol, S.; VandePol, S. Rapid evolution of RNA genomes. Science
**1982**, 215, 1577–1585. [Google Scholar] [CrossRef] [PubMed] - Vignuzzi, M.; Stone, J.K.; Arnold, J.J.; Cameron, C.E.; Andino, R. Quasispecies diversity determines pathogenesis through cooperative interactions in a viral population. Nature
**2006**, 439, 344–348. [Google Scholar] [CrossRef] [PubMed] - Domingo, E.; Holland, J.J. Mutation rates and rapid evolution of RNA viruses. In Evolutionary Biology of Viruses; Morse, S.S., Ed.; Raven Press: New York, NY, USA, 1994; pp. 161–184. [Google Scholar]
- Neumann, A.U.; Lam, N.P.; Dahari, H.; Gretch, D.R.; Wiley, T.E.; Layden, T.J.; Perelson, A.S. Hepatitis C viral dynamics in vivo and the antiviral efficacy of interferon-alpha therapy. Science
**1998**, 282, 103–107. [Google Scholar] [CrossRef] [PubMed] - Lam, N.P.; Neumann, A.U.; Gretch, D.R.; Wiley, T.E.; Perelson, A.S.; Layden, T.J. Dose-dependent acute clearance of hepatitis C genotype 1 virus with interferon alfa. Hepatology
**1997**, 26, 226–231. [Google Scholar] [CrossRef] [PubMed] - Martell, M.; Esteban, J.I.; Quer, J.; Genesca, J.; Weiner, A.; Esteban, R.; Guardia, J.; Gomez, J. Hepatitis C virus (HCV) circulates as a population of different but closely related genomes: Quasispecies nature of HCV genome distribution. J. Virol.
**1992**, 66, 3225–3229. [Google Scholar] [CrossRef] [PubMed] - Gregori, J.; Salicru, M.; Domingo, E.; Sanchez, A.; Esteban, J.I.; Rodríguez-Frías, F.; Quer, J. Inference with viral quasispecies diversity indices: Clonal and NGS approaches. Bioinformatics
**2014**, 30, 1104–1111. [Google Scholar] [CrossRef] - Gregori, J.; Esteban, J.I.; Cubero, M.; Garcia-Cehic, D.; Perales, C.; Casillas, R.; Alvarez-Tejado, M.; Rodríguez-Frías, F.; Guardia, J.; Domingo, E.; et al. Ultra-deep pyrosequencing (UDPS) data treatment to study amplicon HCV minor variants. PLoS ONE
**2013**, 8, e83361. [Google Scholar] [CrossRef] [PubMed] - Willis, A.D. Rarefaction, Alpha Diversity, and Statistics. Front. Microbiol.
**2019**, 10, 2407. [Google Scholar] [CrossRef] - Calle, M.L. Statistical Analysis of Metagenomics Data. Genom. Inform.
**2019**, 17, e6. [Google Scholar] [CrossRef] - Cameron, E.S.; Schmidt, P.J.; Tremblay, B.J.M.; Emelko, M.B.; Müller, K.M. Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities. Sci. Rep.
**2021**, 11, 22302. [Google Scholar] [CrossRef] - Hong, J.; Karaoz, U.; de Valpine, P.; Fithian, W. To rarefy or not to rarefy: Robustness and efficiency trade-offs of rarefying microbiome data. Bioinformatics
**2022**, 38, 2389–2396. [Google Scholar] [CrossRef] [PubMed] - Shamsuri, Q.S.; Ab Majid, A.H. Metagenomic 16S rRNA amplicon data of gut microbial diversity in three species of subterranean termites (Coptotermes gestroi, Globitermes sulphureus and Macrotermes gilvus). Data Br.
**2023**, 47, 108993. [Google Scholar] [CrossRef] [PubMed] - Gotelli, N.J.; Colwell, R.K. Estimating Species Richness. In Biological Diversity: Frontiers in Measurement and Assessment, 1st ed.; Magurran, E.A., McGill, B.J., Eds.; Oxford University Press: New York, NY, USA, 2011; pp. 1–335. [Google Scholar]
- Adombie, C.M.; Bosch, A.; Buti, M.; Campos, C.; Colomer-Castell, S.; Cortese, M.F.; Domingo, E.; Esteban, J.I.; Gallego, I.; Garcia-Cehic, D.; et al. Viral Quasispecies Diversity and Evolution: A Bioinformatics Molecular Approach, 1st ed.; Gregori, J., Rodríguez-Frías, F., Quer, J., Eds.; Il Pensiero Scientific Editore: Rome, Italy, 2023; pp. 1–182. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 25 April 2024).
- Xie, Y. knitr: A General-Purpose Package for Dynamic Report Generation in R. 2023. Available online: https://rdrr.io/cran/knitr/ (accessed on 25 April 2024).
- Wickham, H. Welcome to Master the Tidyverse. J. Open Source Softw.
**2019**, 4, 1686. [Google Scholar] [CrossRef] - Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer: Cham, Switzerland, 2016; p. 260. [Google Scholar] [CrossRef]
- Stubner, R. dqrng: Fast Pseudo Random Number Generators. 2023. Available online: https://CRAN.R-project.org/package=dqrng (accessed on 25 April 2024).
- Magurran, A.E. Measuring Biological Diversity; Wiley-Blackwell: Oxford, UK, 2013; 272p. [Google Scholar]
- Gotelli, N.J.; Colwell, R.K. Quantifying biodiversity: Procedures and pitfalls in the measurement and comparison of species richness. Ecol. Lett.
**2001**, 4, 379–391. [Google Scholar] [CrossRef] - Gregori, J.; Colomer-Castell, S.; Campos, C.; Ibañez-Lligoña, M.; Garcia-Cehic, D.; Rando-Segura, A.; Adombie, C.M.; Pintó, R.; Guix, S.; Bosch, A.; et al. Quasispecies Fitness Partition to Characterize the Molecular Status of a Viral Population. Negative Effect of Early Ribavirin Discontinuation in a Chronically Infected HEV Patient. Int. J. Mol. Sci.
**2022**, 23, 14654. [Google Scholar] [CrossRef] [PubMed] - Gregori, J.; Colomer-Castell, S.; Ibañez-Lligoña, M.; Garcia-Cehic, D.; Campos, C.; Buti, M.; Riveiro-Barciela, M.; Andrés, C.; Piñana, M.; González-Sánchez, A.; et al. In-host flat-like quasispecies, methods and clinical implications. Microorganisms
**2024**, in press. [Google Scholar] - Hill, M.O. Diversity and evenness: A unifying notation and its consequences. Ecology
**1973**, 54, 427–432. [Google Scholar] [CrossRef] - Gregori, J.; Perales, C.; Rodriguez-Frias, F.; Esteban, J.I.; Quer, J.; Domingo, E. Viral quasispecies complexity measures. Virology
**2016**, 493, 227–237. [Google Scholar] [CrossRef] [PubMed] - Gregori, J.; Soria, M.E.; Gallego, I.; Guerrero-Murillo, M.; Esteban, J.I.; Quer, J.; Perales, C.; Domingo, E. Rare haplotype load as marker for lethal mutagenesis. PLoS ONE
**2018**, 13, e0204877. [Google Scholar] [CrossRef] - Todt, D.; Gisa, A.; Radonic, A.; Nitsche, A.; Behrendt, P.; Suneetha, P.V.; Pischke, S.; Bremer, B.; Brown, R.J.; Manns, M.P.; et al. In vivo evidence for ribavirin-induced mutagenesis of the hepatitis E virus genome. Gut
**2016**, 65, 1733–1743. [Google Scholar] [CrossRef] - Agresti, A. Categorical Data Analysis; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2002. [Google Scholar]
- Gregori, J.; Méndez, O.; Katsila, T.; Pujals, M.; Salvans, C.; Villarreal, L.; Arribas, J.; Tabernero, J.; Sánchez, A.; Villanueva, J. Enhancing the Biological Relevance of Secretome-Based Proteomics by Linking Tumor Cell Proliferation and Protein Secretion. J. Proteome Res.
**2014**, 13, 3706–3721. [Google Scholar] [CrossRef] [PubMed] - Aitchison, J. The Statistical Analysis of Compositional Data; Chapman & Hall: Boca Raton, FL, USA; The Blackburn Press: Caldwell, NJ, USA, 1986; 460p. [Google Scholar]
- Pawlowsky-Glahn, V.; Egozcue, J.J.; Tolosana-Delgado, R. Modelling and Analysis of Compositional Data; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2015. [Google Scholar]
- Gloor, G.B.; Wu, J.R.; Pawlowsky-Glahn, V.; Egozcue, J.J. It’s all relative: Analyzing microbiome data as compositions. Ann. Epidemiol.
**2016**, 26, 322–329. [Google Scholar] [CrossRef] [PubMed] - Weiss, S.; Xu, Z.Z.; Peddada, S.; Amir, A.; Bittinger, K.; Gonzalez, A.; Lozupone, C.; Zaneveld, J.R.; Vázquez-Baeza, Y.; Birmingham, A.; et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome
**2017**, 5, 27. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Subsampling with replacement. Theoretical limit to the number of observed items when subsampling with replacement at different subsampling fractions.

**Figure 2.**Single-dominant case. Estimation of the number of haplotypes at different subsampling fractions after B resampling cycles with and without replacement.

**Figure 3.**Single-dominant case. Estimation of the master frequencies at different subsampling fractions after B resampling cycles with and without replacement.

**Figure 4.**Theoretical limit to the number of observed haplotypes in a bootstrap resample cycle. Flat quasispecies with growing reads per haplotype.

**Figure 6.**Flat quasispecies. Ratio of number of haplotypes estimated in subsampling with replacement versus those estimated by the rarefaction equation.

**Table 1.**Subsampling a given fraction with replacement. Proportion of items seen and unseen in a single resampling cycle.

Fraction | Seen | Unseen |
---|---|---|

0.1 | 0.0952 | 0.9048 |

0.2 | 0.1813 | 0.8187 |

0.3 | 0.2592 | 0.7408 |

0.4 | 0.3297 | 0.6703 |

0.5 | 0.3935 | 0.6065 |

0.6 | 0.4512 | 0.5488 |

0.7 | 0.5034 | 0.4966 |

0.8 | 0.5507 | 0.4493 |

0.9 | 0.5934 | 0.4066 |

1.0 | 0.6321 | 0.3679 |

**Table 2.**All-singleton case. Estimating the number of haplotypes. Subsampling a given fraction with replacement.

Frac | True | Expected | Median | IQR | SD | Unique | Replicated |
---|---|---|---|---|---|---|---|

0.1 | 1000 | 952.1 | 952.0 | 8.00 | 6.21 | 0.9520 | 0.0480 |

0.2 | 2000 | 1813.5 | 1812.0 | 17.00 | 12.52 | 0.9060 | 0.0940 |

0.3 | 3000 | 2592.9 | 2593.0 | 23.00 | 15.88 | 0.8643 | 0.1357 |

0.4 | 4000 | 3298.1 | 3295.0 | 25.50 | 20.20 | 0.8238 | 0.1762 |

0.5 | 5000 | 3936.2 | 3932.0 | 33.00 | 24.95 | 0.7864 | 0.2136 |

0.6 | 6000 | 4513.5 | 4512.0 | 39.25 | 27.85 | 0.7520 | 0.2480 |

0.7 | 7000 | 5035.9 | 5033.0 | 37.00 | 28.11 | 0.7190 | 0.2810 |

0.8 | 8000 | 5508.5 | 5505.0 | 42.00 | 30.37 | 0.6881 | 0.3119 |

0.9 | 9000 | 5936.1 | 5934.0 | 43.25 | 32.02 | 0.6593 | 0.3407 |

1.0 | 10,000 | 6323.0 | 6321.5 | 42.00 | 32.47 | 0.6322 | 0.3678 |

ID | Master | Hpl. No. |
---|---|---|

Q.90.10 | 0.9 | 10,001 |

Q.80.20 | 0.8 | 20,001 |

Q.70.30 | 0.7 | 30,001 |

Q.60.40 | 0.6 | 40,001 |

Q.50.50 | 0.5 | 50,001 |

Q.40.60 | 0.4 | 60,001 |

Q.30.70 | 0.3 | 70,001 |

Q.20.80 | 0.2 | 80,001 |

Q.10.90 | 0.1 | 90,001 |

ID | Subsz | NoRpl | WithRpl | Exact |
---|---|---|---|---|

Q.90.10 | 0.50 | 5002.0 | 3933.0 | 5000 |

Q.90.10 | 0.25 | 2500.0 | 2215.0 | 2500 |

Q.90.10 | 0.10 | 1000.0 | 953.0 | 1000 |

Q.90.10 | 0.05 | 501.0 | 490.0 | 500 |

Q.80.20 | 0.50 | 10,002.5 | 7866.0 | 10,000 |

Q.80.20 | 0.25 | 5002.5 | 4423.0 | 5000 |

Q.80.20 | 0.10 | 1999.0 | 1906.0 | 2000 |

Q.80.20 | 0.05 | 1001.5 | 977.0 | 1000 |

Q.70.30 | 0.50 | 14,996.0 | 11,799.5 | 15,000 |

Q.70.30 | 0.25 | 7495.5 | 6635.0 | 7500 |

Q.70.30 | 0.10 | 2999.0 | 2858.0 | 3000 |

Q.70.30 | 0.05 | 1501.0 | 1466.5 | 1500 |

Q.60.40 | 0.50 | 20,005.0 | 15,741.0 | 20,000 |

Q.60.40 | 0.25 | 10,001.0 | 8852.0 | 10,000 |

Q.60.40 | 0.10 | 3999.5 | 3807.5 | 4000 |

Q.60.40 | 0.05 | 1998.0 | 1951.0 | 2000 |

Q.50.50 | 0.50 | 25,001.0 | 19,676.5 | 25,000 |

Q.50.50 | 0.25 | 12,500.5 | 11,070.0 | 12,500 |

Q.50.50 | 0.10 | 5006.0 | 4759.0 | 5000 |

Q.50.50 | 0.05 | 2499.0 | 2440.0 | 2500 |

Q.40.60 | 0.50 | 29,996.0 | 23,609.5 | 30,000 |

Q.40.60 | 0.25 | 14,993.0 | 13,274.0 | 15,000 |

Q.40.60 | 0.10 | 6000.0 | 5706.0 | 6000 |

Q.40.60 | 0.05 | 3001.5 | 2927.5 | 3000 |

Q.30.70 | 0.50 | 35,001.0 | 27,542.5 | 35,000 |

Q.30.70 | 0.25 | 17,504.5 | 15,487.5 | 17,500 |

Q.30.70 | 0.10 | 7004.0 | 6661.0 | 7000 |

Q.30.70 | 0.05 | 3499.0 | 3415.0 | 3500 |

Q.20.80 | 0.50 | 39,997.0 | 31,477.5 | 40,000 |

Q.20.80 | 0.25 | 20,002.5 | 17,701.0 | 20,000 |

Q.20.80 | 0.10 | 7997.0 | 7613.5 | 8000 |

Q.20.80 | 0.05 | 4003.0 | 3904.0 | 4000 |

Q.10.90 | 0.50 | 45,001.0 | 35,409.0 | 45,000 |

Q.10.90 | 0.25 | 22,502.0 | 19,914.0 | 22,500 |

Q.10.90 | 0.10 | 9003.0 | 8565.0 | 9000 |

Q.10.90 | 0.05 | 4503.0 | 4389.0 | 4500 |

ID | Subsz | NoRpl | WithRpl | Exact |
---|---|---|---|---|

Q.90.10 | 0.50 | 0.899980 | 0.90005 | 0.9 |

Q.90.10 | 0.25 | 0.900040 | 0.90004 | 0.9 |

Q.90.10 | 0.10 | 0.900100 | 0.90000 | 0.9 |

Q.90.10 | 0.05 | 0.900000 | 0.90000 | 0.9 |

Q.80.20 | 0.50 | 0.799970 | 0.80010 | 0.8 |

Q.80.20 | 0.25 | 0.799940 | 0.80014 | 0.8 |

Q.80.20 | 0.10 | 0.800200 | 0.79980 | 0.8 |

Q.80.20 | 0.05 | 0.799900 | 0.80020 | 0.8 |

Q.70.30 | 0.50 | 0.700100 | 0.70016 | 0.7 |

Q.70.30 | 0.25 | 0.700220 | 0.70012 | 0.7 |

Q.70.30 | 0.10 | 0.700200 | 0.69980 | 0.7 |

Q.70.30 | 0.05 | 0.700000 | 0.69980 | 0.7 |

Q.60.40 | 0.50 | 0.599920 | 0.60004 | 0.6 |

Q.60.40 | 0.25 | 0.600000 | 0.59988 | 0.6 |

Q.60.40 | 0.10 | 0.600150 | 0.59990 | 0.6 |

Q.60.40 | 0.05 | 0.600600 | 0.60040 | 0.6 |

Q.50.50 | 0.50 | 0.500000 | 0.49985 | 0.5 |

Q.50.50 | 0.25 | 0.500020 | 0.49982 | 0.5 |

Q.50.50 | 0.10 | 0.499500 | 0.50005 | 0.5 |

Q.50.50 | 0.05 | 0.500400 | 0.50000 | 0.5 |

Q.40.60 | 0.50 | 0.400100 | 0.39998 | 0.4 |

Q.40.60 | 0.25 | 0.400320 | 0.39988 | 0.4 |

Q.40.60 | 0.10 | 0.400100 | 0.40040 | 0.4 |

Q.40.60 | 0.05 | 0.399900 | 0.39980 | 0.4 |

Q.30.70 | 0.50 | 0.299993 | 0.30011 | 0.3 |

Q.30.70 | 0.25 | 0.299860 | 0.30008 | 0.3 |

Q.30.70 | 0.10 | 0.299700 | 0.30010 | 0.3 |

Q.30.70 | 0.05 | 0.300400 | 0.30000 | 0.3 |

Q.20.80 | 0.50 | 0.200072 | 0.19978 | 0.2 |

Q.20.80 | 0.25 | 0.199924 | 0.19992 | 0.2 |

Q.20.80 | 0.10 | 0.200400 | 0.20025 | 0.2 |

Q.20.80 | 0.05 | 0.199600 | 0.20020 | 0.2 |

Q.10.90 | 0.50 | 0.100000 | 0.10007 | 0.1 |

Q.10.90 | 0.25 | 0.099960 | 0.09990 | 0.1 |

Q.10.90 | 0.10 | 0.099800 | 0.10000 | 0.1 |

Q.10.90 | 0.05 | 0.099600 | 0.10000 | 0.1 |

Number of Reads | 100,000 |
---|---|

Number of haplotypes | 3083 |

Prominent haplotypes (read counts) | 49,231, 24,615, 12,308, 6154, 3077, 1538 |

Singletons (reads) | 3077 |

Subs | SngFr | Hpl_1 | Hpl_2 | Hpl_3 | Hpl_4 | Hpl_5 | Hpl_6 | Ov1 |
---|---|---|---|---|---|---|---|---|

True | 0.03077 | 0.49231 | 0.24615 | 0.12308 | 0.06154 | 0.03077 | 0.01538 | 0 |

0.5 | 0.03076 | 0.49211 | 0.24626 | 0.12315 | 0.06148 | 0.03084 | 0.01542 | 0 |

0.25 | 0.03080 | 0.49224 | 0.24606 | 0.12316 | 0.06164 | 0.03076 | 0.01536 | 0 |

0.1 | 0.03090 | 0.49240 | 0.24635 | 0.12280 | 0.06160 | 0.03070 | 0.01540 | 0 |

0.05 | 0.03100 | 0.49280 | 0.24620 | 0.12280 | 0.06120 | 0.03060 | 0.01520 | 0 |

Subs | SngFr | Hpl_1 | Hpl_2 | Hpl_3 | Hpl_4 | Hpl_5 | Hpl_6 | Ov1 |
---|---|---|---|---|---|---|---|---|

True | 0.03077 | 0.49231 | 0.24615 | 0.12308 | 0.06154 | 0.03077 | 0.01538 | 0.00000 |

0.5 | 0.01872 | 0.49230 | 0.24604 | 0.12302 | 0.06146 | 0.03078 | 0.01542 | 0.01210 |

0.25 | 0.02396 | 0.49232 | 0.24626 | 0.12308 | 0.06148 | 0.03068 | 0.01536 | 0.00684 |

0.1 | 0.02780 | 0.49215 | 0.24620 | 0.12320 | 0.06170 | 0.03070 | 0.01540 | 0.00285 |

0.05 | 0.02900 | 0.49280 | 0.24600 | 0.12320 | 0.06140 | 0.03060 | 0.01540 | 0.00140 |

Subs | HplNo | Hpl_01 | Hpl_02 | Hpl_03 | Hpl_04 | Hpl_05 |
---|---|---|---|---|---|---|

True | 11 | 0.90000 | 0.01000 | 0.01000 | 0.01000 | 0.0100 |

0.5 | 11 | 0.89996 | 0.01002 | 0.01004 | 0.01002 | 0.0100 |

0.25 | 11 | 0.89990 | 0.01000 | 0.01000 | 0.01004 | 0.0100 |

0.1 | 11 | 0.90000 | 0.01010 | 0.01000 | 0.01000 | 0.0101 |

0.05 | 11 | 0.90000 | 0.01000 | 0.01000 | 0.01000 | 0.0100 |

Subs | Hpl_06 | Hpl_07 | Hpl_08 | Hpl_09 | Hpl_10 | Hpl_11 |

True | 0.0100 | 0.01000 | 0.01000 | 0.01 | 0.01000 | 0.01000 |

0.5 | 0.0100 | 0.01002 | 0.01001 | 0.01 | 0.01002 | 0.00999 |

0.25 | 0.0100 | 0.00996 | 0.00996 | 0.01 | 0.01000 | 0.00996 |

0.1 | 0.0101 | 0.01000 | 0.01010 | 0.01 | 0.01000 | 0.01000 |

0.05 | 0.0100 | 0.01000 | 0.01000 | 0.01 | 0.01000 | 0.01000 |

Subs | HplNo | Hpl_01 | Hpl_02 | Hpl_03 | Hpl_04 | Hpl_05 |
---|---|---|---|---|---|---|

True | 11 | 0.90000 | 0.01000 | 0.01000 | 0.01000 | 0.01000 |

0.5 | 11 | 0.90006 | 0.00999 | 0.01002 | 0.01002 | 0.01004 |

0.25 | 11 | 0.90004 | 0.01000 | 0.01004 | 0.00992 | 0.01004 |

0.1 | 11 | 0.90010 | 0.01000 | 0.01000 | 0.01000 | 0.01000 |

0.05 | 11 | 0.90010 | 0.01000 | 0.01000 | 0.01000 | 0.01020 |

Subs | Hpl_06 | Hpl_07 | Hpl_08 | Hpl_09 | Hpl_10 | Hpl_11 |

True | 0.01000 | 0.01000 | 0.01000 | 0.01000 | 0.01000 | 0.01000 |

0.5 | 0.01002 | 0.00998 | 0.01002 | 0.00998 | 0.00996 | 0.01002 |

0.25 | 0.01000 | 0.01000 | 0.00994 | 0.01004 | 0.01004 | 0.01000 |

0.1 | 0.00990 | 0.01000 | 0.01000 | 0.01000 | 0.01000 | 0.01000 |

0.05 | 0.00980 | 0.00980 | 0.01000 | 0.00980 | 0.00990 | 0.01000 |

**Table 11.**Flat quasispecies: full bootstrap cycle results at growing haplotype frequencies to this case results in Equation (1).

nHpl | k | Reads | Prob | Limit |
---|---|---|---|---|

1000 | 1 | 1000 | 0.6323046 | 0.6321206 |

1000 | 2 | 2000 | 0.8648001 | 0.8646647 |

1000 | 3 | 3000 | 0.9502876 | 0.9502129 |

1000 | 4 | 4000 | 0.9817210 | 0.9816844 |

1000 | 5 | 5000 | 0.9932789 | 0.9932621 |

1000 | 6 | 6000 | 0.9975287 | 0.9975212 |

1000 | 7 | 7000 | 0.9990913 | 0.9990881 |

1000 | 8 | 8000 | 0.9996659 | 0.9996645 |

1000 | 9 | 9000 | 0.9998771 | 0.9998766 |

1000 | 10 | 10,000 | 0.9999548 | 0.9999546 |

Haplotypes | n |

Reads per haplotype | k |

Full sample size | n · k |

Subsampling fraction | f |

Subsample size | round(n · k · f) |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Gregori, J.; Ibañez-Lligoña, M.; Colomer-Castell, S.; Campos, C.; Quer, J.
Virus Quasispecies Rarefaction: Subsampling with or without Replacement? *Viruses* **2024**, *16*, 710.
https://doi.org/10.3390/v16050710

**AMA Style**

Gregori J, Ibañez-Lligoña M, Colomer-Castell S, Campos C, Quer J.
Virus Quasispecies Rarefaction: Subsampling with or without Replacement? *Viruses*. 2024; 16(5):710.
https://doi.org/10.3390/v16050710

**Chicago/Turabian Style**

Gregori, Josep, Marta Ibañez-Lligoña, Sergi Colomer-Castell, Carolina Campos, and Josep Quer.
2024. "Virus Quasispecies Rarefaction: Subsampling with or without Replacement?" *Viruses* 16, no. 5: 710.
https://doi.org/10.3390/v16050710