1. Introduction
The human virome, a diverse community of viruses within the human body, remains relatively unexplored compared to the microbiome. Comprising bacteriophages, viruses infecting human cells, and transient viruses, it reflects the diversity of Earth’s viruses. Recent advancements in extraction and detection methods have unveiled this complexity [
1,
2].
Among the greatest human virome reservoirs, the gastrointestinal tract is the richest due to its diversity and density of viral populations [
3]. Interactions between phages, bacteria, and the human host are crucial. Temperate bacteriophages maintain homeostasis, as the phage composition mirrors that of their bacterial prey, but they are also able to disrupt microbial balance. When this happens, a situation of dysbiosis may occur. An imbalance in human microbial communities is being increasingly linked to chronic diseases, infections, and cancer. It has been postulated that some viruses, both resident and transient, can influence tumorigenesis by modifying the microbiome’s predator–prey dynamics [
4,
5,
6,
7].
The advent of metagenomic sequencing in the early 1980s marked a significant turning point for microbiomics and viromics, offering a vast field of new possibilities in genomics. Viral populations were first reported in 2002 [
8], sparking the exploration of the human virome’s potential. However, metagenomics and NGS analyses provide incomplete viral profiles, and improvements are needed, especially in nucleic acid extraction, to enhance accuracy [
9,
10].
Viral nucleic acid extraction encompasses unique challenges due to the low abundance of viral genetic material compared to bacterial genomes. Minimizing contaminants and maximizing viral DNA or RNA yield are crucial for accurate quantification and characterization. Furthermore, physical handling of virus-like particles (VLPs) presents another challenge. VLPs and viral particles can be fragile and easily disrupted during extraction, thus potentially leading to the loss of nucleic acids and compromising downstream analyses [
11,
12]. Furthermore, it should be noted that many viruses are RNA-based; therefore, precautions must be taken to prevent RNA degradation by RNases, necessitating RNase-free procedures [
13]. These are two reasons why optimizing extraction and purification methods has become an essential concern in achieving reliable virome sequencing. To address these issues, various nucleic acid extraction methods have been developed, being classified as bulk metagenomes and VLP approaches.
On one hand, bulk extraction captures nucleic acids from the entire microbial community, offering efficiency, easiness, and rapidness, but posing an increased risk of contamination with non-viral genetic material while being computationally demanding [
14]. On the other hand, VLP-specific methods are targeted approaches, resulting in purer methods that require less sequencing depth. However, VLP approaches can be biased towards some viruses (due to differences in size, envelope, etc.) and yield lower quantification values, usually requiring amplification steps that may further increase the issue of bias. In addition, they are usually tedious processes. It is still a subject of debate whether choosing one strategy or the other is more optimal for viral nucleic acid extraction [
14].
The results of the studies carried out on analyses of the virome are highly dependent on how the analyses were obtained; thus, for any starting sample, factors external to the composition of the sample itself, such as the sample collection, preservation, and handling, or the extraction method used, are critical for a solid analysis. Therefore, establishing the reliability, representativeness, and reproducibility of the data generated is essential for drawing valid conclusions from the obtained results. To accomplish this, we explored the human virome’s complexities, aiming to define the optimal viral nucleic acid extraction protocol that combines purity and reproducibility, ensuring accurate virome analysis.
2. Materials and Methods
2.1. Study Participants, Sample Collection, Preparation, and Storage
All fecal samples used for this study were previously collected in 2021 as part of the AECC 2017-1485 project awarded to Prof. A. Moya, funded by the Fundación Científica de la Asociación Española contra el Cáncer. Aliquots of stool samples remained stored at −70 °C at the facilities of the biobank Biobank IBSP-CV (PT17/0015/0017), integrated in the Spanish National Biobanks Network and in the Valencian Biobanking Network, located at the Foundation for the Promotion of Sanitary and Biomedical Research of Valencia Region (FISABIO-Public Health). Stool samples were selected from previously anonymized samples obtained from a cohort of healthy volunteers from the Valencian Region. They had been characterized by carrying Lynch syndrome mutations, genetically diagnosed at the Program of Genetic Counseling in Cancer of the Valencian Community (Spain). These samples had their corresponding signed informed consent and information sheets. All experimental protocols, as well as the sample cession, were approved by the Ethics Committee for Clinical Research of the Directorate General of Public Health and Center for Advanced Research (CEIC-DSP/CSISP).
Briefly, each participant was provided with a sample collection kit, which included sterile containers containing 2 mL of RNAlater Solution (Ambion, Austin, TX, USA) to preserve and stabilize RNA integrity until arrival at the laboratory. Upon arrival at the laboratory, samples were promptly homogenized by adding 2 mL of phosphate-buffered saline (PBS), consisting of 8 g of NaCl, 0.2 g of KCl, 1.44 g of Na2HPO4, and 0.24 g of KH2PO4 per liter (pH 7.2). Subsequently, they were centrifuged at 805× g at 4 °C for 5 min to eliminate fecal debris. The 1.5 mL samples of resulting supernatants were then split into 2 mL screw cap microcentrifuge tubes and stored at −70 °C prior to processing.
For the experiments, two tubes containing the frozen fecal solution from each subject were thawed on ice and split into six 500 μL aliquots, spanning two replicas for the three tested extraction protocols. One of them was based on bulk metagenomics (Protocol 1), and two of them were based on a VLP strategy (Protocols 2 and 3).
2.2. Nucleic Acid Extraction Procedures
2.2.1. Protocol 1: Bulk Metagenomics Strategy
The first extraction protocol, based on bulk metagenomics, used the QIAamp ®Fast DNA Stool Mini (QIAGEN, Valencia CA, USA), following the manufacturer’s instructions with some modifications. In addition to the stool samples, two replicas used as negative controls, containing sterile water instead of stool solution, were run in parallel to the samples throughout the procedure. Briefly, samples were centrifuged at maximum speed (20,000× g), the supernatant was discarded, and 1 mL of InhibitEX Buffer and 20 μL of lysozyme were added. The samples were then incubated for 30 min at 37 °C. After the incubation, 200 μL of glass beads, which had been acid-washed (Sigma, St. Louis, MI, USA) and previously prepared, were added to the samples, and the mix was heated for 5 min at 95 °C. The mix was then centrifuged at maximum speed for 1 min. Next, 600 μL of the supernatant was transferred to a 1.5 mL microcentrifuge tube containing 20 μL of proteinase K. After adding 600 μL of Buffer AL, the tubes were incubated at 70 °C for 10 min, followed by the addition of 600 μL of absolute ethanol and mixing by vortex. Afterwards, 600 μL of the sample was transferred to the column and centrifuged for 1 min at maximum speed. This step was repeated until all the samples passed through the column. Then, 500 μL of the first washing buffer, Buffer AW1, was introduced to the column, and 1 min of centrifugation at maximum speed was carried out. This step was repeated with Buffer AW2. Finally, the column was placed into the final collection tube where 50 μL of the elution buffer ATE was added, and, after incubation at room temperature for 1 min, DNA was eluted by centrifugation for 1 min. The fluorometric quantification was then assessed using a Qubit® dsDNA HS (High Sensitivity) Assay Kit (Invitrogen by ThermoFisher Scientific, Waltham, MA, USA) in an Invitrogen™ Qubit™ 3 Fluorometer (Invitrogen by ThermoFisher Scientific), according to the manufacturer’s instructions.
2.2.2. Protocol 2: VLP-Enrichment Strategy A (Modified NetoVIR)
The first of the two VLP strategy protocols tested was based on the Novel Enrichment Technique of Viromes (NetoVIR) [
15], a fast, reproducible, and high-throughput technique expressly devoted to NGS gut viromics studies However, it included some modifications affecting the RT-PCR amplification and PCR product purification. As in protocol 1, two negative controls, containing sterile water instead of stool solution, were run in parallel to the samples throughout the procedure, but in this case, two additional negative controls of the RT-PCR reaction were also included by adding sterile water instead of extraction products.
For the enrichment step, the fecal suspension was homogenized and centrifuged at 17,000× g for 3 min, retrieving at least 200 μL of the supernatant, which was then filtered in a 0.8 μm filter (Sartorius, Göttingen, Germany) at 17,000× g for 1 min. Finally, 7 μL of a premade 20× resolving enzyme buffer (12.11 g of 50 mM Tris, 1.47 g of 5 mM CaCl2 and 0.61 g of 1.5 mM MgCl2 in 80 mL of ultrapure H2O, pH 8.0), 2 μL of benzonase (Sigma-Aldrich, St. Louis, MI, USA), and 1 μL of micrococcal nuclease (Thermo Fisher Scientific) were added to 130 μL of sample filtrate. The mixture was incubated for 2 h at 37 °C. To stop the reaction, 7 μL of 0.2 M EDTA was added. The extraction step of this protocol was optimized using the QIAamp Viral RNA Mini kit (QIAGEN), following the manufacturer’s protocol, but proceeding without the addition of carrier RNA to the lysis buffer at the first step of the extraction. For the amplification by RT-PCR, the components (1 μL of 50 μM random hexamers, 1 μL of 10 mM dNTP mix, 11 μL of template RNA) from the SuperScriptTM IV First-Strand cDNA Synthesis Reaction kit (Invitrogen, Carlsbad, CA, USA) were combined in a reaction tube. This mix was heated at 65 °C for 5 min and incubated on ice for 1 min. Then, 4 μL of 5X SSIV buffer, 1 μL of 100 mM DTT, 1 μL RNaseOUT™ Recombinant RNase Inhibitor, and 1 μL of SuperScriptTM IV Reverse Transcriptase (200 U/μL) were added to the mix and incubated at 23 °C for 10 min, 50–55 °C for 10 min, and 80 °C for 10 min, then finished by heating at 85 °C for 5 min. Afterwards, the second-strand synthesis was achieved using the second-strand cDNA synthesis kit’s (Invitrogen) reagents and following its instructions. First, 20 μL of first-strand cDNA synthesis reaction mixture, 55 μL of nuclease-free water, 20 μL of 5X second-strand reaction mix, and 5 μL of second-strand enzyme mix were pipetted directly into the first-strand reaction tube on ice. Incubation was performed at 16 °C for 60 min, and to stop the reaction, 6 μL 0.5 M EDTA with pH 8.0 was added. The reaction was kept on ice until 10 μL (100 U) RNase was added. Then, it was incubated for 5 min at room temperature. Finally, an additional modification of the original protocol was introduced for the PCR product purification, which was carried out using the DNA Clean and Concentrator-5 kit (Zymo Research, Freiburg, Germany). For each sample, 5 volumes of DNA binding buffer were added, and the mix was transferred to a Zymo-Spin column with a collection tube and centrifuged at 14,000× g for 30 s. Subsequently, 200 μL of DNA washing buffer was loaded into the column and centrifuged at 14,000× g for 30 s, a step that was repeated once more. Finally, 15 μL of preheated UltraPure DEPC-treated water, heated at 70 °C, was added and an incubation lasting 5 min at room temperature was performed, followed by centrifugation at 14,000× g for 1 min. The fluorometric quantification was then assessed using a Qubit® dsDNA HS (High Sensitivity) Assay Kit according to the manufacturer’s instructions.
2.2.3. Protocol 3: VLP-Enrichment Strategy B (Modified SISPA)
The second of the two VLP strategy protocols tested was based on the sequence-independent single-primer amplification (SISPA) technique. It shared the initial steps, that is, the enrichment and nucleic acid extraction, with Protocol 2, as well as the PCR product purification. As in Protocol 2, two negative extraction controls and two additional negative RT-PCR reaction controls were included, but two negative PCR amplification controls were also used for the amplification step of this protocol (see below).
After the enrichment and nucleic acid extraction (see protocol 2 for details), the amplification step was started by mixing 1 μL of random primer A (5′-GTTTCCCAGTCACGATCNNNNNNNNN-3′, Condalab, Torrejón de Ardoz, Spain), 1 μL of dNTPs (Ecogen, Barcelona, Spain), 3 μL Ultrapure DEPC-treated water, and 8 μL of extracted DNA/RNA. This mix was denatured by incubating it for 5 min at 65 °C, then cooled on ice for 5 min. Afterwards, the Reverse Transcriptase SuperScriptTM IV kit (Invitrogen) was utilized according to the manufacturer’s instructions. For each tube, 4 μL of 5× SSIV buffer, 1 μL of 100 mM DTT, 1 μL of RNase™ Out, and 1 μL of SuperScript™ IV enzyme were added. The following reverse transcription conditions were used: 23 °C for 10 min; 50 °C for 10 min; and, finally, 80 °C for 10 min. Subsequently, 1 μL of RNase H (ThermoFisher Scientific) was introduced to the tubes (kept on ice). Then, for the second-strand synthesis, the tubes were incubated at 95 °C for 5 min in the thermocycler and cooled down on ice for 5 min. A Sequenase I mix was prepared with the following reagents: 2 μL of Sequenase reaction buffer, 0.3 μL of Sequenase 2.0 enzyme (ThermoFisher Scientific), and 7.7 μL of UltraPureTM DEPC-treated water. The following conditions were used: from 10 °C to 37 °C for 8 min, ramping up 1 °C every 18 s, and 37 °C for 8 min, then 94 °C for 2 min and 10 °C for 5 min. To complete the second-strand synthesis, the Sequenase II mix was prepared by adding 0.9 μL of enzyme dilution reagent and 0.3 μL of Sequenase 2.0 enzyme in the following conditions: 10 °C to 37 °C for 8 min, ramping up 1 °C every 18 s, then 37 °C for 8 min, 94 °C for 8 min, and 10 °C for 5 min. The amplification step was carried out using samples with a mix of 8 μL of MgCl2, 10 μL of PCR Gold Buffer 10X, 1 μL of dNTPs 100 mM, 1 μL of Taq DNA polymerase, 1 μL of Primer B (5′-GTTTCCCAGTCACGATC-3′, Condalab), and 69 μL of UltraPureTM DEPC-treated water. Finally, 10 μL of the sample was added. Tubes were incubated at 95 °C for 10 min, 94 °C for 30 s, 40 °C for 30 s, 50 °C for 30 s, 72 °C for 1 min, and 72 °C for 10 min. Finally, fluorometric quantification was assessed using a Qubit® dsDNA HS (high-sensitivity) Assay Kit according to the manufacturer’s instructions.
2.3. Library Preparation and Sequencing
Before construction of the libraries, an automated electrophoresis process using the Agilent 4150 TapeStation System (Agilent, Santa Clara, CA, USA) with the high-sensitivity D5000 ScreenTape system kit, allowed for the assessment of the quality and length of DNA and RNA samples, in this case analyzing DNA molecules from 100–5000 base pairs (bp). Libraries were generated for both the samples and negative controls. Libraries were prepared using Nextera® XT DNA kit (Illumina, Carlsbad, CA, USA) based on fragmentation and tagmentation of the input DNA. The kits employed were the Nextera XT DNA Library Preparation Kit (Illumina) and the Nextera XT Index Kit (Illumina), following the manufacturer’s guide with the following modifications: 10 μL of TD buffer, 8 μL of DNA at 0.2 ng/μL, and 2 μL of tagmentase. The incubation at 55 °C lasted 2 min 30 s, and the clean-up was performed with 0.8× Ampure beads. Furthermore, once this procedure was finished, a quantification using a Qubit® dsDNA HS (high-sensitivity) Assay Kit was carried out. Then, after examining the DNA concentration, samples with undesirable values were selected for a 10-cycle recovery PCR, being purified with 0.9× Ampure Beads. Sequencing was carried out using the NextSeq® 500 System from Illumina, selecting the conditions for the acquisition of a single-read DNA sequence with a length of 150 base pairs using the NextSeq550 MidOutput kit (150 c).
2.4. Bioinformatic Analysis
The raw BCL files were converted to standard fastq files by means of the bcl2fastq program (version 2.20.0.422) of Illumina, and the single-end reads were filtered out and trimmed for quality with the Fastp application (version 0.23.3) [
16] in four sequential steps: (i) front and tail bases with quality values lower than 20 were trimmed; (ii) the bases lower than 15 on the right of the mean quality in the front to tail sliding window with size 4 were dropped; (iii) ploy X tails were trimmed; and (iv) reads shorter than 50 bases were discarded. The filtered reads were mapped onto the Homo sapiens genome database (GenBank assembly accession GCA_000001405.29, GRCh38.p14 release [
17] genome using Bowtie2 (version 2.5.1)) [
18] with a very-sensitive-local preset. Kaiju (version 1.9.2) [
19] was used for taxonomic assignment of non-human reads by comparing them to the NCBI nr+euk (10 March 2022) reference database, with a maximum of 5 mismatches allowed and a minimum matching length of 20 amino acids. Using R (version 4.1.1) [
20], all the reads belonging to the same taxon and sample were counted, and the results were saved in a table. After taxonomic annotation, only those reads matching viruses were further processed. A strict filtering step of the viral reads identified in the negative controls was carried out, so all viruses that had at least one read at the species level in any negative control of their corresponding protocols were removed from the samples.
2.5. Statistical Analysis
The obtained abundance matrix at the family level was subjected to statistical analysis using R statistical software v4.2.3 (March, 2023) [
20]. The nucleic acid composition of each virus was determined and plotted by protocol and by sample.
To decrease the possible effects of protocol-specific contamination, all taxa identified in the control samples were filtered out from all samples of the respective protocol, and the new abundance matrix was used for all further analyses. Normalization was carried out using the “Analysis of Compositions of Microbiomes with Bias Correction” (ANCOM-BC) package [
21]. To delineate differences in the abundances of virus families of each donor attributed to protocol variations, we utilized the normalized matrix and computed the log2 fold-change. To establish the reproducibility of each protocol, using normalized data, the beta diversity among the replicates of each sample was computed using the Bray–Curtis dissimilarity index through the “Vegan” package [
22], and a paired Wilcoxon test was used to compare the different protocols using the wilcox.test function from the basic R package. Moreover, to identify the differences between the number of viral taxa detected using each protocol, alpha diversity comparing protocols were determined using the Shannon index and the “Microbiome” package [
23].
To further analyze the obtained data, only those samples that had at least one hundred viral reads were considered. Normalization using ANCOM-BC was performed on the remaining samples, followed by conducting an analysis of variance using the distance matrices (Adonis) test in the “Vegan” R package. The differential abundance of taxa between protocols was determined using a Wilcoxon test. The identified taxa were then clustered both by protocol and by similitude.
4. Discussion
The human virome can be defined as a collection of all viruses settled in a specific niche of the human body, which are part of the human microbiota. Although the bacterial fraction of the microbiota has been more extensively researched than the viral fraction, the significance of the different viruses for human well-being should not be overlooked. Recent advancements in next-generation sequencing (NGS) technologies have enabled the identification of viral sequences within biological samples. However, the study of the human virome still has important drawbacks. In the present study, we compared three different extraction protocols, aiming to identify the pros and cons for their use in virome analysis.
Fecal samples provided by 10 volunteers were extracted using two replicas per sample and three different extraction protocols. The total extracted DNA from a total of 60 samples was sequenced, and the viral data were analyzed.
Among the viral species found in humans, bacteriophages have been identified as the most prevalent viruses in the human digestive tract, while gut phages are predominantly part of dsDNA virus families [
8,
24,
25,
26,
27]. Furthermore, environmental factors and genetic traits may also contribute to the diversity of phage populations, making viromes highly individual-specific [
28]. It is important to determine the abundance of these types of viruses in gut samples. The main weak point of Protocol 1 (bulk metagenomics strategy) is the fact that it does not comprise a reverse transcription step; therefore, it only allows for the sequencing of DNA viruses (mostly dsDNA). This may be considered as a shortcoming for studies aiming at the analysis of the whole viral community, including those other than DNA viruses. However, this methodologically simple protocol, compared to the two VLP-based strategies used in this work, seems to be the most efficient approach in terms of identification of higher numbers of reads attributable to bacteriophage taxa, and, as the majority of bacteriophages are dsDNA viruses, it can be the most recommendable protocol if only bacteriophage identification, although as complete as possible, is sought. In addition, some single-stranded DNA viruses were found in the Protocol 1 outcome, despite the fact that the method for library construction only works on dsDNA. The main explanation for this result is that when the replication of the single DNA strand from these viruses occurs, they incorporate a transient double-strand phase resembling prokaryotic rolling circle plasmids [
29]; therefore, those ssDNA viruses were identified at the moment when the nucleic acid extraction was performed. However, the observation of lower levels of ssDNA viruses in Protocol 1 compared to Protocols 2 (VLP-enrichment strategy A) and 3 (VLP-enrichment strategy B) can be explained by the absence of DNase treatment; therefore, no removal of DNA molecules occurred in any protocol. This was combined with the RT-PCR and second strand synthesis steps carried out in Protocols 2 and 3. These additional steps, used to convert ssRNA and dsRNA into ds-cDNA, also resulted in the random priming and amplification of a second strand of ssDNA viruses using their single-strand genomes as templates. Although to a lesser extent, both Protocol 2 and Protocol 3 also showed that dsDNA viruses made up the highest percentage of the identified viruses, making them a reasonable, although less efficient, alternative when phages are searched for along with other types of viruses.
The dsDNA portion was still the most representative viral type found in Protocol 2, and the fraction assigned as unclassified was increased compared to Protocol 1. Interestingly, with this protocol, dsRNA viruses could also be noticed. The contribution of RNA viruses can range from 38 to 63% [
30]. Nonetheless, it should be highlighted that the percentage and contribution of the RNA viruses to the total viral population is highly dependent on the environment [
30,
31]. It is evident that our understanding of RNA viruses is far from complete. Broadly speaking, the gut’s RNA virome has been explored to a lesser extent than the DNA virome. This is primarily attributed to the lack of stability RNA viruses seem to have in samples when compared to DNA viruses, making their identification through metagenomic sequencing challenging [
32].
Protocol 3 was the protocol with the greatest number of viruses without complete known classifications. Notably, the identification of dsRNA viruses through Protocol 3 was considerably diminished when compared to Protocol 2. Previous studies have shown that this type of extraction protocol should be efficient in the detection of RNA viruses [
33,
34]. In 2022, Chrzastek and collaborators were able to complete the whole-genome assembly of SARS-CoV-2 and influenza A from a sample containing a mixture of viruses, and the same was accomplished when applied to avian RNA viruses [
34].
The intestinal virome taxa were studied at the phylum and family level. The most abundant intestinal viruses found in this study were bacteriophage members of the group
Caudoviricetes (dsDNA), regardless of the protocol utilized. These dsDNA viruses are collectively known as “tailed bacteriophages” [
35], and could, under specific conditions, alter the intestinal bacterial population, diminishing beneficial bacteria and initiating intestinal inflammation [
36]. Furthermore,
Caudoviricetes represents the majority of phage sequences described to date [
27].
Furthermore, other viral families such as
Picornaviridae (ssRNA),
Alphaflexiviridae (ssRNA),
Virgaviridae (ssRNA), and
Microviridae (ssDNA) have strong representation in the human gut virome, according to the Gut Virome Database (GVD) built by Gregory and colleges [
14]. In this study, the database was created from 2697 human gut metagenomes derived from 1986 samples from individuals encompassing 16 countries and extracted by VLP or bulk methods. According to the results, 97.7% of the GVD corresponded to bacterial viruses, and only 2.1% to eukaryotic viruses. In this last group,
Picornaviridae,
Alphaflexiviridae, and
Virgaviridae could be found at the top of the list of the most common viral species. It should be mentioned that the latter two families correspond to plant viruses, but their presence in the human virome is frequent [
37].
Microviridae was one of the few single-stranded DNA phages to be identified.
At the phylum level, both
Alphaflexiviridae and
Virgaviridae belong to the phylum
Kitrinoviricota, while
Picornaviridae belongs to the phylum
Pisuviricota. Both groups include multiple species of viruses that are found in a broad range of environments. A recent phylogenetic study [
38] was performed to deeply analyze the RNA virome, and the researchers discovered new clades belonging to both phyla. They also mentioned that
Kitrinoviricota and
Pisuviricota showed distinctive genetic and phylogenetic features, so it was possible for them to be related evolutionary groups that are kept apart from other RNA bacteriophages and other RNA viruses. Lastly, the vast majority of single-stranded DNA viruses corresponded to the family
Microviridae and the phylum
Phixviricota, icosahedral bacteriophages compromising ssDNA viral species that are utilized as model systems for studying morphogenesis and the evolution of assembly [
39].
To determine the reproducibility of each protocol, beta diversity results were compared between replicates. No significant differences were found when comparing Protocol 1 and Protocol 2. However, in comparison to Protocol 3, both Protocol 1 and Protocol 2 exhibited considerably higher reproducibility. Protocol 1, distinguished by its simpler procedure without a PCR amplification step, subjected the samples to minimal manipulation. Protocol 2 was a virus-specific extraction protocol that included a retro-transcription step; nonetheless, the handling of the sample was kept to a minimum. Finally, Protocol 3 contained several amplification steps, increasing the probability of sample contamination. Notably, the amplification primer used in Protocol 3 was a random primer, which might have been responsible for the increase in uncharacterized viruses found using this protocol. Despite the relatively wide use of SISPA-based methods for the characterization of the human virome [
40,
41,
42] to address limitations inherent to virome determination, such as the lack of universal genes equivalent to the prokaryotic 16S rRNA gene, our results on the low replicability of the methodology based on this protocol urge us to be cautious regarding the reliability of this methodology. Theoretically, SISPA-based protocols, such as Protocol 3 in our study, because of their inconsistent reproducibility, require several replicates to be carried out in parallel and pooled. This approach aims to address the significant variability observed among replicates, ultimately aiming for a more precise representation of the actual virome. But, even so, that would still not necessarily reflect the actual composition of the virome due to the randomness of the amplifications in each replica. Even if we were to pool several replicas, the proportions of the different viruses are already distorted and biased. The main advantage of pooling them is that, with the mixture, we can increase the range of viruses detected, but how much exists of each of them would not be quantifiable.
Finally, the heatmap revealed distinct patterns of families’ enrichment or depletion among protocols. For instance, in Protocol 1, the most enhanced groups were Caudoviricetes and Ackermannviridae, while others, such as Anelloviridae, Picobirnaviridae, and Virgaviridae, were almost depleted. Regarding Protocol 2, Picobirnaviridae and Steigviridae were the most enriched families, at the expense of Ackermannviridae and Herelleviridae. Finally, the enriched families in Protocol 3 included those stated in Protocol 2, as well as the families Herelleviridae and Virgaviridae. It seemed that the patterns for depleted and enriched families between Protocol 1 and Protocol 2 were opposite, while Protocol 2 and Protocol 3 presented similar outcomes. Indeed, studying the maintenance of data between individuals and replicas, both Protocol 2 and Protocol 3 were more unstable than Protocol 1, where preserved results between individuals and duplicates could be observed. These findings were also shown in our sample clustering analysis, where replicas in Protocol 1 were clustered together, whereas Protocol 2 and Protocol 3 duplicates were more widely spread.