Novel Virus Identification through Metagenomics: A Systematic Review

Metagenomic Next Generation Sequencing (mNGS) allows the evaluation of complex microbial communities, avoiding isolation and cultivation of each microbial species, and does not require prior knowledge of the microbial sequences present in the sample. Applications of mNGS include virome characterization, new virus discovery and full-length viral genome reconstruction, either from virus preparations enriched in culture or directly from clinical and environmental specimens. Here, we systematically reviewed studies that describe novel virus identification through mNGS from samples of different origin (plant, animal and environment). Without imposing time limits to the search, 379 publications were identified that met the search parameters. Sample types, geographical origin, enrichment and nucleic acid extraction methods, sequencing platforms, bioinformatic analytical steps and identified viral families were described. The review highlights mNGS as a feasible method for novel virus discovery from samples of different origins, describes which kind of heterogeneous experimental and analytical protocols are currently used and provides useful information such as the different commercial kits used for the purification of nucleic acids and bioinformatics analytical pipelines.


Introduction
Viruses are the most abundant organisms on Earth [1] and play a key role in the ecosystem in which they reside [2]. Virus interactions affect their own abundance and evolution [3]; furthermore they have a deep impact on host individuals, populations and communities [4,5], as well as on environment biogeochemical cycles [6].
Before mNGS approach development, virus discovery was challenging due to highly variable genomes: the lack of shared sequences hindered the employment of an ampliconbased strategy [20], which can be exploited for bacteria analysis through 16S rRNA gene sequencing [21]. Shotgun mNGS overcame this challenge, providing the untargeted sequencing of all the microbial genomes present in the sample; then, the found reads could be classified based on their similarity to the reference genomes [9,20].
In the last years, metagenomics applications have expanded into several field. mNGS moved from research to clinical laboratories, modifying the approach for infectious disease diagnosis and treatment, as well as improving cancer-associated viruses analysis [10]. Furthermore, mNGS revolutionized virome ecology studies, which previously analyzed Figure 1. Flowchart of the literature search. Search was performed following the guidelines of the preferred reporting items for systematic reviews and meta-analyses (PRISMA).
The earliest included article was published in 2008, with a gradual increase in subsequent years, reaching an apparent plateau with up to 54 elements in 2020. After 2013, there was a sharp increase in the number of articles concerning animal viruses, which accounted for 73.9% of the total, compared to 9.5% and 16.4% for viruses identified in plants and environmental samples, respectively (Figure 2a,b). The number of samples analyzed in the published literature was highly variable, between 1 and over 200,000 samples (mosquito specimens [37]). The most frequent sample size was between 11-100 (published papers n = 131). A total of 107 papers showed a sample size ≤10, 51 articles analyzed a number of samples between 101-500, and 43 analyzed >500 samples. In addition, 43 papers did not state the sample size, and 14 papers analyzed sample sets already present in databases (Figure 3a).  The number of samples analyzed in the published literature was highly variable, between 1 and over 200,000 samples (mosquito specimens [37]). The most frequent sample size was between 11-100 (published papers n = 131). A total of 107 papers showed a sample size ≤10, 51 articles analyzed a number of samples between 101-500, and 43 analyzed >500 samples. In addition, 43 papers did not state the sample size, and 14 papers analyzed sample sets already present in databases (Figure 3a). From a geographical point of view, the studies were carried out in all 6 World Hea Organization (WHO) regions, with 36% in the Americas Region (n = 123), 29% in South-East Asia Region (n = 99) and 26% in the European Region (n = 90), followed by Africa Region (6%, n = 21), Western Pacific Region (2%, n = 6) and Eastern Mediterrane Region (n = 1) (Figure 3b). A total of 30 studies did not indicate the exact origin of samples, while those taken from the Arctic (n = 2), Antarctic (n = 3) and oceans (n = 8) not included in the WHO classification.
Regarding the sequencing platforms, the studies were carried out using four sequencing methods: Illumina, Sanger, Ion Torrent and Roche 454. The most utilized platform was Illumina (75%, n = 285), followed by Roche 454 (12%, n = 45), Sanger sequencing (6%, n = 22), and Ion Torrent (5%: n = 18). Seven records (2%) included analysis of public viral metagenomes deposited in databases (as NCBI RefSeq, short read archive (SRA) database, ICTV database) [179][180][181][182][183][184][185], and four records did not specify which sequencing platforms were used [124,128,129,186] (Figure 5a). Illumina has become the most used platform since 2013 (Figure 5b). Lastly, for the identification of viral genomes, 62% of the articles (n = 234) used a strategy based on overlapping reads and generations of "contigs". This strategy, named de novo assembly, allows the reconstruction of the original genome, starting from the sequenced fragments. Among the 234 articles, 77% concerned samples of animal origin, 12% samples of environmental origin, 10% plant origin. A progressive increase in de novo assembly is observed from 2016 until 2019 ( Figure S1). Lastly, for the identification of viral genomes, 62% of the articles (n = 234) used a strategy based on overlapping reads and generations of "contigs". This strategy, named de novo assembly, allows the reconstruction of the original genome, starting from the sequenced fragments. Among the 234 articles, 77% concerned samples of animal origin, 12% samples of environmental origin, 10% plant origin. A progressive increase in de novo assembly is observed from 2016 until 2019 ( Figure S1).

Overview of the Extracted Characteristics
For each included article, it was then possible to collect and present data regarding first author, date of publication, host, sample size and provenance, type of specimen, enrichment strategies, nucleic acid purification kits, retro-transcription, sequencing platforms, de novo assembly, genome, viral family and name of the novel virus identified (Table S2).
By way of example, a graphical overview of the main characteristics extracted from articles published in 2021 is provided ( Figure 6). Each paper is cited in the x axis (numbers refer to the list of references provided in Table S2) and is represented by a column describing its characteristics. By means of color coding, it is possible to identify sample type, origin and size; the method used for enrichment, purification, sequencing and analysis; and the viral genome. . Each column represents one paper, and numbers refer to the list of references provided in Supplementary File S1. Each row describes one specific characteristic (from top to bottom): sample, enrichment method, sample size, viral genome, sequencing platform, de novo assembly, retro-transcription (RT), provenance of the sample (WHO classification) and extraction method. Categories and color codes for each data field are indicated in the figure legend. Gray boxes indicate that the technique was not performed or the information was not provided.

Viral Genomes and Viral Families in Different Sample Types
Among the articles describing novel viruses in animal samples, 270 viruses were identified as belonging to known viral families. In particular Parvoviridae, Picornaviridae, Circoviridae, Anelloviridae, Reoviridae, Astroviridae, Flaviviridae, Rhabdoviridae, Papillomaviridae and Dicistroviridae, are the most represented families ( Figure 7a); for 10 publications, it was not possible to identify the family to which they belong [125,130,180,181,[187][188][189][190][191][192]. Among the articles describing novel viruses in the environment, 60 viruses were identified as belonging to known viral families; in particular, Siphoviridae, Myoviridae, Podoviridae, Microviridae, Phycodnaviridae, Mimiviridae, Picornaviridae, Circoviridae, Herpesviridae and Hepeviridae are the most represented ( Figure 7b); for two publications it was not possible to identify the family to which they belong [193,194]. Among the articles describing novel viruses in plants, all 36 articles described novel viruses belonging to known viral families, with Geminiviridae, Tombusviridae, Potyviridae, Luteoviridae, Bromoviridae, Closteroviridae, Unclassified, Genomoviridae, Partitiviridae, Tymoviridae and Narnaviridae being the most represented families (Figure 7c). . Each column represents one paper, and numbers refer to the list of references provided in Supplementary File S1. Each row describes one specific characteristic (from top to bottom): sample, enrichment method, sample size, viral genome, sequencing platform, de novo assembly, retro-transcription (RT), provenance of the sample (WHO classification) and extraction method. Categories and color codes for each data field are indicated in the figure legend. Gray boxes indicate that the technique was not performed or the information was not provided.

Viral Genomes and Viral Families in Different Sample Types
Among the articles describing novel viruses in animal samples, 270 viruses were identified as belonging to known viral families. In particular Parvoviridae, Picornaviridae, Circoviridae, Anelloviridae, Reoviridae, Astroviridae, Flaviviridae, Rhabdoviridae, Papillomaviridae and Dicistroviridae, are the most represented families ( Figure 7a); for 10 publications, it was not possible to identify the family to which they belong [125,130,180,181,[187][188][189][190][191][192]. Among the articles describing novel viruses in the environment, 60 viruses were identified as belonging to known viral families; in particular, Siphoviridae, Myoviridae, Podoviridae, Microviridae, Phycodnaviridae, Mimiviridae, Picornaviridae, Circoviridae, Herpesviridae and Hepeviridae are the most represented ( Figure 7b); for two publications it was not possible to identify the family to which they belong [193,194]. Among the articles describing novel viruses in plants, all 36 articles described novel viruses belonging to known viral families, with Geminiviridae, Tombusviridae, Potyviridae, Luteoviridae, Bromoviridae, Closteroviridae, Unclassified, Genomoviridae, Partitiviridae, Tymoviridae and Narnaviridae being the most represented families (Figure 7c).
In total, 681 different novel viruses were identified, 290 with DNA genome, 348 with RNA genome and 43 unclassified, as defined in NCBI taxonomy and viral zone databases. In particular, 516 novel viruses were identified in animal samples, 110 in environmental samples and 55 in plants. Analysis performed by type of sample revealed that the distribution of classified viral genomes in the different types of samples was not homogeneous, with a prevalence of RNA viruses compared to DNA viruses in plants (RNA 71% vs. DNA 25%) and animals (RNA 56% vs. DNA 38%), while in environmental samples, a prevalence of DNA viruses over RNA viruses (DNA 74% vs. RNA 16%) was observed ( Figure 8). In total, 681 different novel viruses were identified, 290 with DNA genome, 348 with RNA genome and 43 unclassified, as defined in NCBI taxonomy and viral zone databases In particular, 516 novel viruses were identified in animal samples, 110 in environmenta samples and 55 in plants. Analysis performed by type of sample revealed that the distribution of classified viral genomes in the different types of samples was not homogeneous, with a prevalence of RNA viruses compared to DNA viruses in plants (RNA 71% vs. DNA 25%) and animals (RNA 56% vs. DNA 38%), while in environmental samples, a prevalence of DNA viruses over RNA viruses (DNA 74% vs. RNA 16%) was observed ( Figure 8).

Novel Viruses Found by mNGS Studies
Among viruses infecting animals, new bacteriophages were identified [195]. Bacteriophages are implicated in the dynamics and diversity of bacterial populations in a number of ecosystems, including the human gut [196][197][198], confirming that mNGS technologies allow investigation of the so-called "viral dark matter" [199].
Potential pathogenic viruses for species at zoonotic risk to humans were also identified. For example, new Bocaparvoviruses were identified in different animal species, such as alpacas [200], wild squirrel [124] and tufted deer [121]. Novel Bocaparvoviruses were identified in different geographical areas and in different animal species, including bats [201], camels [202], gorillas [203], marmots [204], pigs [205] and rodents [206], and are associated with various veterinary diseases of the respiratory and gastrointestinal tract and acute respiratory diseases in humans. Their presence in previously unreported animal species, such as the alpaca, whose close contact with humans is favored by breeding and exposure in geographical areas other than that of origin, may be an important element in relation to possible zoonotic risks.
Among the viruses that infect plants, a new Grablovirus was identified in Prunus spp., confirming the possible use of mNGS in the diagnosis of viral infections in economically

Novel Viruses Found by mNGS Studies
Among viruses infecting animals, new bacteriophages were identified [195]. Bacteriophages are implicated in the dynamics and diversity of bacterial populations in a number of ecosystems, including the human gut [196][197][198], confirming that mNGS technologies allow investigation of the so-called "viral dark matter" [199].
Potential pathogenic viruses for species at zoonotic risk to humans were also identified. For example, new Bocaparvoviruses were identified in different animal species, such as alpacas [200], wild squirrel [124] and tufted deer [121]. Novel Bocaparvoviruses were identified in different geographical areas and in different animal species, including bats [201], camels [202], gorillas [203], marmots [204], pigs [205] and rodents [206], and are associated with various veterinary diseases of the respiratory and gastrointestinal tract and acute respiratory diseases in humans. Their presence in previously unreported animal species, such as the alpaca, whose close contact with humans is favored by breeding and exposure in geographical areas other than that of origin, may be an important element in relation to possible zoonotic risks.
Among the viruses that infect plants, a new Grablovirus was identified in Prunus spp., confirming the possible use of mNGS in the diagnosis of viral infections in economically important sectors such as agriculture. The appearance of these viruses in temperate and tropical woody plant species and herbaceous plants is symptomatic of climate change consequences, since a common feature of viruses within the Geminiviridae family was that they were primarily pathogens of economically important plant species, mainly in the tropical and subtropical regions of the world [207]. Other viruses belonging to the same family Geminiviridae are the Begomoviruses, which infect dicotyledonous plants and have an economic effect linked to the cultivation of tomatoes. A new Begomovirus associated with severe symptoms in tomatoes was identified in Brazil, where the preferential strategy for Begomovirus management in tomatoes is the employment of cultivars carrying disease resistance/tolerance genes, such as the Ty-1 gene [208]. The new Begomovirus identified in this article suggests a mechanism of potential adaptation to the tolerance factor Ty-1, which highlights the potential drawbacks of employing virus-specific resistance in tomato breeding [209].
Among the environmental viruses, an example is the identification of a new Phycodnavirus belonging to the Phycodnaviridae family in a phytoplankton bloom occurring in the West Antarctic Peninsula (WAP) [210]. The identification of this new virus suggests the usefulness of mNGS in the study and monitoring of plankton dynamics as responses to climate change in this warming region [211]. Another study identifies a new virus belonging to the Picobirnaviridae family in the wastewater from Santiago de Chile, confirming the relevance of sewage viromes as epidemiological surveillance tools and supporting the usefulness of sewage viral metagenomics for public health surveillance [40].
Some examples of new viruses identified in different sample types are described in Table 1.

Bioinformatics Pipelines
The various published articles show different algorithms for the metagenomic analysis or different reference databases, but present four common steps in the bioinformatics analysis process: (a) quality control (QC) check, (b) read trimming, adapter removal, and further filtering, (c) viral genome identification and (d) analysis of the results [19].

Quality Control (QC) Check, Sequence Trimming and Filtering
Quality control (QC) is a critical step in the processing of NGS data and aims to produce high quality data, starting from the raw sequences generated by the sequencing platform, to be provided to the algorithms involved in the subsequent analysis phases (rawread processing). The programs that carry out this phase are responsible for performing the quality control of the obtained data, the removal of the sequences of the adapters and indexes (sequences artificially inserted in the reads of each sample, in order to recognize the fragments belonging to each sample), the filtering of low quality sequences, and in some cases the filtering of polyclonal sequences. The QC protocol must be designed for a specific dataset, taking into account the differences inherent in different sequencing technologies (short-read platform vs. long-read platform) [229].

Viral Genome Identification
The next step involves the removal of high quality reads belonging to the host genome, through alignment to the host reference genome, using common alignment algorithms (for example Bowtie2 [236], STAR [237], Blast [238], BWA [239] and SAMtools [240]). High quality reads, filtered by the host genome, can now be used for identification of viral genomes through two different strategies: (1) by alignment of reads to a reference sequence database (normally the NCBI nucleotide database (nt) [241], the non-redundant protein database (nr) [241] or the Reference Viral Database (RVDB) [242]); (2) by new de-novo assembly, based on overlapping reads rather than mapping reads, to reference genomes.
In the first case, since amino acid sequences are more conserved than nucleotide sequences, the alignment of the reads to a protein database instead of a nucleotide database, improves the sensitivity of the classification, However, it requires more computational power and more computational time. De novo assembly instead allows the reconstruction of the original genome, starting from the sequenced fragments (generally in the range from 100 to 250 bp), through the production of longer assembled sequences, called "contigs" and "scaffolds". Metagenomic assembly is a very complex process and requires a high uniformity of coverage throughout the genome, and also between different genomes, in case there are more viruses in the sample [243]. The most used tools for this process are listed in Table 3. Table 3. Most-used tools for metagenomic assembly.

Analysis of the Results
It is necessary to verify the viral origin of the contigs and/or scaffold produced in the previous step, especially in mixed samples, in which other organisms may be present in addition to the virus of interest. This can be done by comparing sequences with reference sequence databases (such as GenBank non-redundant (NR), nucleotide (NT), Refseq viral, Uniprot viral) or custom viral databases generated in-house and using BLASTx (for comparison with protein databases), blastn (for comparison with nucleotide databases) and DIAMOND [260] (translated protein search mode). Alternatively the HMMER [261] (http://hmmer.org/, accessed on 28 June 2022) algorithm can be used to detect true homologs rather than traditional BLAST-based approaches, based on the fact that certain positions in a sequence alignment are likely to differ in their probability of containing an insertion or a deletion [262]. Finally, through the use of the Sequence Demarcation Tool program, it is possible to verify the result of the previous analysis using a graphic approach that can make it easier to identify the sequences where the similarity is stronger. It is also possible to use the nucleotide or reconstructed amino acid sequence of the virus for phylogenetic analysis. This procedure is carried out by aligning the viral genome sequence with other reference sequences (ideally of similar length), and the result provides information about homology with viruses of different species. This aspect can be very important in the field of public health, such as in the discovery of viruses responsible for new outbreaks of infection, as a potential pathogenic virus can be recognized and investigated by analyzing the epidemiological link between genetic sequences of other pathogens. The most used algorithms to perform this analysis are CLUSTAL, MUSCLE, MAFFT, T-Coffee [263], (available from: https://www.ebi.ac.uk/Tools/msa/, accessed on 28 June 2022) and Megan [264], which, in addition to taxonomic analysis, also allows functional analysis, generation of graphs, clustering and networks analysis.
Having obtained the complete (or almost complete) genomic sequence, it is possible to proceed to the presumed identification of the open reading frames (ORF), which is performed by prediction. The main algorithms that are used for this process are NCBI ORF FINDER, Glimmer (Gene Locator and Interpolated Markov ModelER) and Geneious. Estimation of the relationships between the identified sequence of the virus and its common ancestors [265] or between sequences that supposedly contain genes to assume their function [266] is carried out through the creation of phylogenetic trees, estimated through different methods (Neighbor Joining, UPGMA Maximum Parsiony, Bayesian Inference and Maximum Likelihood [ML]) [267] and algorithms (MEGA, PhyML and IQ-Tree).

Discussion
mNGS is one of the most rapidly evolving fields of biology, allowing broadening of our understanding of diversity, ecology and the evolution of microbial communities from different habitats. Its application to the identification of new pathogens or for monitoring known agents in clinical and environmental samples makes it an instrument of choice in the One Health prevention approach. This strategy is based on the awareness that human health is closely linked to that of animals and the environment, an awareness that also aims at reducing the risk of potential epidemics [268][269][270][271][272]. This review highlighted that the application of NGS technologies is currently feasible also in middle and low-income countries, mainly thanks to international collaborations [45,49,104,151,[273][274][275]. In these regions, costs associated with infrastructure, equipment, reagents and expertise could pose serious challenges to the use of NGS for pathogen identification [276]; one possible proposal to overcome this limitation may be the establishment of omics international networks. mNGS allows primer-independent, unbiased detection of the viroma and the reconstruction of full-length viral genomes, even in the case of unknown or poorly characterized viruses [17, 277,278], comprising bacteriophages, providing an unprecedented opportunity for the discovery of novel viruses [180]. Since viruses do not share conserved sequences, the definition of viroma is obtainable exclusively through shotgun metagenomic sequencing of the entire microbial community.
This review summarizes previous studies related to the use of mNGS for the identification of new viruses from different types of samples (animal, plant and environmental), through a systematic review of the literature [279,280]. The different processes involved in the studies include processing of different sample types for nucleic acid purification, sequencing and bioinformatics data analysis. For each of these aspects, the literature analysis has highlighted heterogeneous approaches that make it impossible to compare the results but allow the identification of new viruses from different matrices using a plethora of different strategies, responding flexibly to different research questions.
As regards the purification of nucleic acids, the methods used are based on commercial kits producing similar samples in terms of purified microbial communities [281,282]. The different kits used are indicated for the purification of DNA, RNA, or both; however, it has been shown that identification of RNA virus (Norovirus) by mNGS in biological samples such as feces is also possible from nucleic acid obtained by a DNA purification kit, after a retro-transcription step before library preparation [178]. For environmental samples, different sample enrichment strategies before purification have been identified, based on tangential flow filtration, pore filtration, PEG precipitation and FeCl3 precipitation, ultracentrifugation, fluidic circuit, chemical flocculation and syringe filtration, and it has been previously shown that viral richness, viral specificity, viral pathogen detection and viral community composition for metagenomic analyses are influenced by concentration protocols [283].
It has been shown that the use of pre-extraction enrichment methods can introduce bias in the identification of microbial species present in the sample [283]. For example, filtration can reduce the abundance of bacterial [284] or viral species [285,286], depending on pore size dimension. Similarly, enrichment based on low-force centrifugation can induce depletion of large viruses [285]. The use of PEG for the identification of viral species Life 2022, 12, 2048 15 of 28 from wastewater induced a better recovery of Adenovirus but a lower recovery efficiency of Human Rotavirus A, compared to methods based on charged membrane or glass wool [287], while enrichment methods based on microfluidics devices have proven to be effective in the characterization of airway microbiomes [288].
To improve shotgun metagenomics results some post-extraction enrichment technologies have also been developed: they aim to reduce host nucleic acid sequences, enriching samples for microbial genomes. Thanks to these methods, an increased number of microbial reads could be obtained from sequencing, improving the number of species and taxa detected and the coverage of each sequence and allowing detection of less-abundant species. Further, they could reduce costs associated with mNGS, since more samples can be analyzed in the same sequencing run. Some enrichment technologies used to remove human DNA from samples have been described [289,290]. In addition, some commercial kits are available, and their efficiency in human DNA depletion has been tested and compared [289,[291][292][293]. However, bias in the phylogenetic composition of samples could be introduced by using these post-extraction enrichment methods because of the unspecific removal of some bacterial DNA [289][290][291]293].
With regard to library preparation methods and sequencing platforms, a decrease in heterogeneity has been observed over the years, thanks to the gradual increase in the use of Illumina, which is currently the most widespread among the NGS methods available.
Millions of sequence reads are generated from a single run and must be analyzed through dedicated software and bioinformatics pipelines, to produce meaningful results [19,243,294,295]. The various published articles show different algorithms for metagenomic analysis or different reference databases, but present common steps in the bioinformatics analysis process, which has been described in detail and allows the identification of known and new viruses in the samples.
However, it is possible that erroneous chimeric sequences could be generated from multiple related viruses, if present in the same sample, or even between viral and nonviral sequences. It has been suggested that the use of bioinformatic pipelines that include a chimera checking step could improve the quality of sequencing data [296]. Another possible approach to improve metagenome assemblies from community microbial samples could be the use of sequencing platforms that generate long-range reads [297,298], such as Oxford Nanopore Technologies (ONT, Oxford, UK) [299], Pacific Biosciences (PacBio) [300], 10X Genomics (Pleasanton, CA, USA) [301] and Hi-C [298]. Moreover, mixed approaches, which associate the use of long-range and short nucleotide technologies, could improve the quality of the data [302].
Although there are not many published reports in which the sensitivity of mNGS in identifying viral species is evaluated, it is known that this may vary depending on the sequencing platform used, the depth of sequencing, and the type of virus present in the sample. Frey et al. [303] found that the sensitivity of both Illumina MiSeq and Ion Torrent PGM in identifying Influenza A virus in blood samples stood at 10 4 genome copies/mL, using an Ion 314 chip for Ion Torrent PGM with an output of 30-50 million single-end reads, and a 300 cycle kit for Illumina MiSeq with an output of 24-30 million paired-end reads. Be et al. [304] found that the sensitivity of Illumina GA IIx to identify B. antracis in Aerosol DNA extract and Soil DNA extract is 10 GEs and 10 2 GEs, respectively, with a depth of 37.5 million single-end reads/sample (1 sample per lane). Bukowska-Osko et al. [305] detected the presence of HIV in CSF HIV RNA-positive samples containing at least 10 2 viral copies/mL; they also detected the presence of HSV-1 DNA in CSF samples containing at least 10 3 viral copies/µL, using an Illumina Hi-Seq 1500 and with a depth of about 33 million reads/sample.
In conclusion, despite the lack of standard protocols from the sampling phase to the production of the interpreted data, the different methods currently existing at each stage of the process offer effective tools for the exploration of the earth viroma (animal, plant, environment). The lack of standardization, moreover, seems to be better suited to the exploration of the sequences that are generated by sequencing but that are not attributable to any known organism/sequence. In fact, the continuous development of new analysis tools also allows the study of previously generated High Throughput Sequencing datasets, which allow the detection of new viral sequences, even pathogenic ones, not previously recognized in the samples analyzed. Though the lack of standardized approaches is currently a constraint on the use of this technology in the regulatory area, for example, for the monitoring of zoonoses or water quality by official control bodies, which require standardized processes [33], mNGS technologies have proven to be suitable and crucial for tracking novel SARS-CoV-2 hosts, evolution, and spread patterns [17,24]. mNGS is characterized by a constant and ultra-rapid development of new analytical methods, both biological and bioinformatics, that seem better suited to follow the dynamic of pandemic events and in general viral outbreaks. This suggests the opportunity to implement these technologies to establish early warning systems [74] and to design effective disease control and prevention strategies.