Characterization of Diverse Anelloviruses, Cressdnaviruses, and Bacteriophages in the Human Oral DNA Virome from North Carolina (USA)

The diversity of viruses identified from the various niches of the human oral cavity—from saliva to dental plaques to the surface of the tongue—has accelerated in the age of metagenomics. This rapid expansion demonstrates that our understanding of oral viral diversity is incomplete, with only a few studies utilizing passive drool collection in conjunction with metagenomic sequencing methods. For this pilot study, we obtained 14 samples from healthy staff members working at the Duke Lemur Center (Durham, NC, USA) to determine the viral diversity that can be identified in passive drool samples from humans. The complete genomes of 3 anelloviruses, 9 cressdnaviruses, 4 Caudoviricetes large bacteriophages, 29 microviruses, and 19 inoviruses were identified in this study using high-throughput sequencing and viral metagenomic workflows. The results presented here expand our understanding of the vertebrate-infecting and microbe-infecting viral diversity of the human oral virome in North Carolina (USA).

Although not directly associated with disease, microbe-infecting viruses are relevant to human health as they alter microbial abundance, diversity, and evolution within the body [5,9].Thus, understanding the diversity and evolutionary history of both vertebrate-infecting and microbe-infecting viruses of the human oral virome will have connections to human health and disease in the broadest sense.
While projects such as the Cenote Human Virome Database (CHVD) and Oral Virus Database (OVD) have been successful at pooling thousands of oral viral sequences and genomes based on host geography, our body of knowledge about the human oral virome is notably incomplete though rapidly expanding [1,3,10].This rapid growth is largely due to the use of varying sample collection methods in conjunction with increasingly popular metagenomics surveys.Passive drool collection techniques are commonly used for extracting high-quality genomic data from saliva [11].This approach collects unstimulated saliva, thus reducing contamination associated with spitting, swabbing, and rinsing [11][12][13].Although some studies have looked at the oral microbiome and passive drool techniques for bacterial composition, this has not been the case for the oral virome where most studies have either used swabbing techniques or have not described a specific collection technique [4,14,15].The few studies that have used unstimulated saliva collection methods combined with viral metagenomics have shown success in detecting viruses across numerous viral families [5].Even so, there is a notable geographic bias in these studies.For saliva samples taken in the United States of America (USA), the OVD includes virus sequences or genomes from just 109 individuals with 101 of these individuals sampled in northern California [3,16,17].Consequently, there is a scarcity of data on the human oral virome for most of the USA.To assess the feasibility of identifying complete virus genomes in saliva samples of humans, we tested passive drool collection and used viral metagenomic sequencing methods to characterize viruses in human saliva from the southeastern USA, focusing on the staff at a captive-primate colony as a pilot project.This study is part of a larger project identifying viruses in humans and nonhuman primates at the Duke Lemur Center (Durham, NC, USA).

Sample Collection
Saliva samples were collected from healthy adult participants using the passive drool method and Saliva Collection Aid (Salimetrics, Carlsbad, California, USA).These passive drool techniques where the participant allows saliva to pool in the mouth and drip into the collection aid allow for easy self-collection of up to ~2 mL of whole saliva while minimizing contamination [11,13].Fourteen saliva samples were obtained from individual participants working or volunteering at the Duke Lemur Center between August 2021 and May 2022 (n = 14) (Durham, NC, USA).Samples were frozen at −80 • C until viral DNA extraction.This study was approved by the Duke University Campus Institutional Review Board (IRB #2022-0009).

Viral Nucleic Acid Extraction, Sequencing, De Novo Assembly, and Virus Genome Identification
Viral DNA was extracted from 200 µL of passive drool sample from the participants individually using the High Pure Viral Nucleic Acid Kit (Roche Diagnostics, Indianapolis, IN, USA).Rolling circle amplification was performed using the Illustra TempliPhi Kit (GE Healthcare, Chicago, IL, USA) to preferentially amplify circular DNA in the samples.Illumina sequencing libraries were generated using the Illumina DNA Prep Kit (Illumina Inc., San Diego, CA, USA), and samples were sequenced on the Illumina NovaSeq 6000 (Illumina Inc., San Diego, CA, USA).Paired-end reads (2 × 150) were trimmed using Trimmomatic-0.39 [18].Trimmed reads were de novo assembled with MEGAHITv.1.2.9 [19].Diamond [20] BLASTx was used to analyze the assembled contigs against a viral RefSeq database (release 207; downloaded from NCBI in September 2021).Circular genomes were determined based on the terminal redundancy in the de novo assembled contigs.

Distribution of Virus Genomes across the Samples
To identify the distribution of the viral genomes across samples, we first clustered viruses into virus operational taxonomic units (vOTUs) with a 98% identity using SDT v1.2 [23].For each unique vOTU, we mapped the reads from the Illumina sequencing to a representative genome of each vOTU using BBMap [25].

Anelloviruses
Genome sequences of viruses in the genera Alphatorquevirus and Betatorquevirus and representatives in Gammatorquevirus (to serve as the outgroup) of the Anelloviridae family were downloaded from GenBank in May 2023.The ORF1 gene from the available GenBank sequences along with the ORF1 gene of anelloviruses identified in this study were extracted and translated.ORF1 amino acid sequences were aligned using MAFFT v.7.113 [26].The alignment was used to infer a maximum likelihood phylogenetic tree using PhyML 3.0 [27] with best-fit amino acid substitution model VT+F determined using ProtTest 3 [28].Branches with <0.7 approximate likelihood-ratio test (aLRT) branch support values were collapsed with TreeGraph2 [29].
We extracted the Rep sequences that form clusters with those from this study as well as those from the established viral cressdnavirus families and the CRESS groups 1-6.The sequences in this dataset were aligned with MAFFT v7.113 [26], and the alignment was trimmed with TrimAL (0.2 gap threshold) [41].The trimmed alignment of the Rep amino acid sequences was then used to infer a maximum likelihood phylogenetic tree with IQ-TREE 2 [42] (with Q.pfam+F+G4 as the best-fit amino acid substitution model) and aLRT branch support [43].The phylogenetic tree was visualized with iTOL v6 [44].
For each cluster from the SSN that had Rep sequences of the viruses identified in this study, we aligned the Rep amino acid sequences using MAFFT v7.113 [26].This alignment was used to infer maximum likelihood phylogenetic trees using PhyML 3.0 [27] with best-fit models determined using ProtTest 3 [28] (LG+I+G+F for CRESS6, RtREV+I+G+F for Cluster 2, and LG+I+G+F for Cluster 1).Branches with <0.8 aLRT support were collapsed with TreeGraph2 [29].

Microviruses
Complete genomes of microviruses available in GenBank were downloaded in May 2023.From the genomes and microvirus genomes identified in this study, major capsid protein (MCP) sequences were extracted.These, together with representative MCPs from members of the Bullavirinae sub-family (to serve as an outgroup), were translated and aligned with MAFFT v7.113 [26].The alignment was then trimmed with TrimAL [41] (0.2 gap threshold) and used to infer a maximum likelihood phylogenetic tree using IQTree 2 [42] (with Q.pfam+F+G4 as the best-fit amino acid substitution model).The tree was visualized using iTOL v6 [44].

Large Bacteriophages and Inoviruses
For the four large bacteriophages identified in this study, a proteomic tree of dsDNA bacteriophages was generated with ViPTree server version 3.1 (with auto gene prediction) [45].For the inoviruses identified in this study, a custom database was generated of complete inovirus genomes available through GenBank, and this was used to infer a proteomic tree using ViPTree server version 3.1 (with auto gene prediction).Once the closest neighbors to the viruses identified in this study were determined, intergenomic distances within clades were calculated using VIRIDIC [24].CheckV [46] was used to verify the completeness of annotated phage genomes.

Results and Discussion
Our passive drool sampling approach of the saliva coupled with viral metagenomic workflows for DNA viruses resulted in the identification of genomes of diverse anelloviruses (n = 3), cressdnaviruses (n = 9), Caudoviricetes bacteriophages (n = 4), microviruses (n = 29), and inoviruses (n = 19) from human saliva.The viral genomes identified in this study and their accession numbers are summarized in Table 1.Human reads have been removed from all SRA-deposited data.The SRA-deposited data consist only of mapped reads to the viral described in this study.In 14 saliva samples from healthy individuals, we were able to obtain the complete genomes of viruses representing 55 species.The highest number of vOTUs were present (>50% genome coverage) in Duke_HF4 (n = 20), followed by Duke_HF5 (n = 16) and Duke_HF2 (n = 9) (Figure 1).The rest of the samples contained less than seven vOTUs each.Duke_HF4 contained vOTUs from unclassified cressdnaviruses (n = 4), Caudoviricetes bacteriophages (n = 1), microviruses (n = 12), and inoviruses (n = 3).The vOTU consisting of inovirus D_HF1_11 (OR148966) and inovirus D_HF7_9 (OR148967) was present in four samples (Figure 1).The vOTU consisting of inovirus D_HF3_12 (OR148970), inovirus D_HF4_80 (OR148971), and inovirus D_HF5_75 (OR148972) was additionally present in four samples.Eight of the inovirus vOTUs were present in more than one sample.Five of the microvirus vOTUs were present in two samples.All three of the redondovirus vOTUs were present in two samples.

Anelloviruses
The Anelloviridae family consists of non-enveloped DNA viruses that have a high prevalence across global avian and mammal populations [47,48].Anelloviruses have been identified across diverse hosts, including humans, nonhuman primates, livestock, birds, and even in some invertebrates (likely derived from a blood meal) [48][49][50].They have been identified in various host sample types, including tissue, blood, fecal, nasal, and saliva samples [50][51][52].Anelloviruses are consistently found as a part of various mammal and avian viromes with common coinfections of multiple anelloviruses in one host [47].Within the context of human anelloviruses, Spandole et al. (2015) estimated that over 90% of humans in some regions, including Russia, Japan, and Pakistan, carry anelloviruses [53].While anelloviruses have been detected in immunocompromised patients and are an emerging biomarker of immune response and, specifically, organ transplant rejection, they have not been directly associated with pathological effects on their hosts [8].

Cressdnaviruses
Cressdnaviricota is a rapidly growing yet elusive phylum consisting of diverse and globally distributed viruses [57].The viruses within the phylum Cressdnaviricota infect eukaryotic hosts, including animals, fungi, plants, protists, and potentially archaea [6,58].Cressdnaviricota currently contains 12 families of circular, replication-associated protein encoding single-stranded (CRESS) DNA viruses (Amesuviridae, Bacilladnaviridae, Circoviridae, Geminiviridae, Genomoviridae, Metaxyviridae, Nanoviridae, Naryaviridae, Nenyaviridae, Redondoviridae, Smacoviridae, and Vilyaviridae) [58,59].In addition to viruses classified into these 12 genera, a large number of cressdnaviruses have been discovered that are yet to be placed within a defined family.The viruses in the phylum Cressdnaviricota are united by their small ssDNA genomes encoding a replication-associated protein (Rep) and capsid protein (Cp).As the Rep is more conserved across cressdnaviruses, this is generally utilized for phylogenetic analyses coupled with pairwise sequence identities for family-and genus-level classifications.
Nine genomes (Figure 3) that encode Rep were identified in this study.These genomes all fall within the Cressdnaviricota phylum based on their Rep analysis (Figure 3).Of these nine, five are part of the family Redondoviridae, three form clusters with various unclassified cressdnaviruses, and one is a singleton based on the Rep-based sequence similarity network and corresponding phylogeny (Figure 3).All cressdnavirus genomes identified in this study encode a Cp and Rep in a bidirectional orientation (Figure 3).

Redondoviruses
Members of the family Redondoviridae were first identified in the human oro-respiratory tract.Redondoviridae is one of the more recently established and highly divergent families within Cressdnaviricota [60,61].Redondoviruses are circular, ssDNA viruses with ~3-3.1 kb genomes [58].The Redondoviridae family consists of one genus, Torbevirus, and two species, Brisavirus and Vientovirus, and these viruses have been found in high prevalence within human populations with frequent infections of two or more redondoviruses in a single individual [62].Kinsella et al. (2022), using a computational workflow and over a thousand metagenomic datasets, predicted redondoviruses' host to be Entamoeba gingivalis, an oral protozoan with an enigmatic role in periodontitis [6].This was later confirmed by DNA proximity-ligation assay (Hi-C) on xenic culture cells [63].

Redondoviruses
Members of the family Redondoviridae were first identified in the human oro-respiratory tract.Redondoviridae is one of the more recently established and highly divergent families within Cressdnaviricota [60,61].Redondoviruses are circular, ssDNA viruses with ~3-3.1 kb genomes [58].The Redondoviridae family consists of one genus, Torbevirus, and two species, Brisavirus and Vientovirus, and these viruses have been found in high prevalence within human populations with frequent infections of two or more redondoviruses in a single individual [62].Kinsella et al. (2022), using a computational workflow and over a   Despite the close genetic similarity between the redondoviruses characterized in this study and known redondoviruses, this work still adds to a growing body of knowledge about the prevalence of redondoviruses and highlights the success of using viral metagenomics and passive drool techniques to recover complete redondovirus genomes.

Unclassified Cressdnaviruses
The phylum Cressdnaviricota is a recently established phylum and has 12 established families.Nonetheless, there are many cressdnaviruses that still remain unclassified and that represent new families and species [58].Using sequence similarity networks of Rep proteins of cressdnaviruses, unclassified cressdnaviruses may belong to putative familylevel clusters, such as CRESS1-6 and Clusters 1 and 2 (Figure 3).In this study, we identified four novel unclassified cressdnaviruses in one individual's saliva sample (D_HF4).The genomes were 2546 nt (OR148959), 1938 nt (OR148961), 2279 nt (OR148960), and 2572 nt (OR148958) in length.These four unclassified cressdnaviruses have GC contents of 39.2% to 56.6%.
Additionally, the Rep of one cressdnavirus, cressdnavirus D_HF4_1353 (OR148958), cannot be placed within any of the current family-level clusters (Figure 3).Based on an NCBI BLASTp search of the Rep of cressdnavirus D_HF4_1353 (OR148958), it is most closely related to the Rep of Cressdnaviricota sp.Miresoil virus 60 (OM154761), sharing 37% of amino acid identity (query cover 77%).Cressdnaviricota sp.Miresoil virus 60 (OM154761) is a cressdnavirus identified from bog soil in Sweden [66].
The complete cressdnavirus genomes identified in this study highlight the significant number of novel cressdnaviruses present in just a limited number of samples from healthy individuals.Even within the saliva sample of one individual (D_HF4), cressdnaviruses are impressively diverse.Further, as in the case of cressdnavirus D_HF4_1794 (OR148960) sharing almost 100% amino acid identity with viruses recovered from wastewater, indirect detection of human-associated cressdnaviruses in wastewater is possible.

Tailed dsDNA Phages
The dsDNA bacteriophages in the viral class Caudoviricetes are the most abundant, diverse group of viruses on the earth, identified from Antarctic soils to the human gut, vaginal, and oral viromes [67][68][69][70].Aside from the conserved major capsid protein with the characteristic HK97-fold, genomes of the members of Caudoviricetes vary in structure and size drastically [71].For example, large bacteriophages can have dsDNA genomes ranging from ~18 kb encoding ~20-30 genes (Roundtreeviridae, Salasmaviridae viral families) to ~626 kb encoding hundreds of genes (Bacillus phage G; unassigned family) [72].While there have been extensive studies on bacteriophages in the human gut virome, we are only beginning to unravel the diversity of bacteriophages in the oral virome [67].Although this work presents a small number of genomes, these novel genomes add to bacteriophage diversity and will aid in the building of taxonomic frameworks for classification and association with oral bacteria [73,74].
Four complete genomes of viruses in the class Caudoviricetes were identified in this study, and each represents a novel species.The predicted hosts of three of the large phages, caudovirus D_HF2_7 (OR148984), caudovirus D_HF4_2 (OR148985), and caudovirus D_HF5_3 (OR148987), are bacteria within the Actinomycetota phylum (Figure 7).Actinomycetota groups have been found to be important members of both the human microbiome and soil ecosystems [75,76].Actinomyces are commensal, filamentous bacteria present across the various niches of the human microbiome.Actinomyces are able to induce actinomycosis, a rare granulomatous chronic disease primarily impacting immunocompromised people [76].The fourth novel large phage caudovirus D_HF5_2C (OR148986) is predicted to infect bacteria in the Bacillota phylum (Figure 7).Bacillota, a phylum comprised of over 200 bacterial genera, has been found to consistently be one of the dominant groups in many human gut microbiome studies [77,78].

Microviruses
Microviruses are small ssDNA bacteriophages known to infect enterobacteria and to be relatively ubiquitous across metagenomic surveys [33,[79][80][81].Microvirus genomes are ~4-6 kb and usually contain multiple overlapping reading frames.Their genomes typically encode a more conserved major capsid protein (MCP) along with a replication initiator protein (Rep) and scaffolding proteins [33,82].The Microviridae family is currently composed of two sub-families, Gokushovirinae and Bullavirinae [80].However, recent research emphasizing the diversity of microviruses has suggested that the current Microviridae family should be elevated to its own order comprising 3 suborders and 19 families [80].These extensive, potential taxonomic adjustments reveal that our knowledge base of microviruses has rapidly expanded and will continue to do so with the rise of viral metagenomic methods.A-C).Virus genomes identified in this study are highlighted in red font and starred.The bacterial host group that the phage is predicted to infect is specified in color to the left of the accession information of each included virus.Genome annotations are depicted below each proteomic tree.

Microviruses
Microviruses are small ssDNA bacteriophages known to infect enterobacteria and to be relatively ubiquitous across metagenomic surveys [33,[79][80][81].Microvirus genomes are ~4-6 kb and usually contain multiple overlapping reading frames.Their genomes typically encode a more conserved major capsid protein (MCP) along with a replication  A-C).Virus genomes identified in this study are highlighted in red font and starred.The bacterial host group that the phage is predicted to infect is specified in color to the left of the accession information of each included virus.Genome annotations are depicted below each proteomic tree.
In total, 29 complete microvirus genomes were identified in this study representing 27 vOTUs.Twelve microvirus vOTUs were identified in one saliva sample (Duke_HF4) (Figure 1).The genomes ranged from 4311 to 7033 nt in length, and all the 29 microvirus genomes encode at least an MCP, Rep, and DNA pilot protein (Figure 8).These microviruses have a GC content of 32.7% to 56.6%.The microviruses identified in this study, based on the MCP, are phylogenetically located within the Gokushovirinae sub-family, Alpavirinae putative sub-family, and Pichivirinae putative sub-family (Figure 9) based on the MCP amino acid phylogeny.BLASTn analyses were performed to determine the similarity of the microviruses identified in this study to previously characterized microviruses.The 29 microviruses share 70% to 99% of nucleotide identity to known microvirus sequences with a query cover ranging from 3% to 100% (Table 2).Many of these microviruses share high similarity with sequences identified from human metagenome studies [10] (n = 21).Two of the identified microviruses, microvirus D_HF4_150 (OR148995) and microvirus D_HS33_14 (OR149011), share >95% nucleotide sequence similarity with microviruses identified by Tisza et al. (2021), denoting that they are the same species as previously characterized microviruses [10].Microvirus D_HF4_150 (OR148995) and microvirus D_HS33_14 (OR149011) share 99% (query cover 100%) and 97% of nucleotide identity (query cover 93%) with Microviridae sp.cti0q21 (BK051052) and Microviridae sp.ctMkX8 (BK042793), respectively, both from a human oral sample with predicted bacterial genus host Prevotella [10].Prevotella, an anaerobic Gram-negative bacterium, has been found to be abundant in the human oral cavity, particularly in the subgingival plaque [83], and although most strains have low pathogenicity, some have been associated with chronic inflammatory diseases [84].The other microviruses share 70-84% of nucleotide identity with sequences of microviruses from the human nasopharyngeal cavity (n = 1), tortoise feces (n = 1), minnow tissue (n = 3), freshwater (n = 1), wastewater (n = 1), and a tunicate intestinal tract (n = 1).Overall, the microviruses described here infect bacteria, such as Prevotella, residing in the human oral cavity.Although the importance of microviruses in the human gut has been previously emphasized [80], the diversity of microviruses shown here from only 14 saliva samples demonstrates the prominence of microviruses in the human oral cavity.As microviruses infect bacteria that both play a commensal role in the human microbiome and have pathogenic potential, microviruses are important members of the human oral virome, likely controlling the abundance and behavior of their bacterial hosts.

Inoviruses
The Inoviridae family consists of diverse, filamentous bacteriophages known to infect hosts across the Bacteria domain and potentially across Archaea [85].Viruses in the family Inoviridae are classified into 25 genera with 43 species [86].Inoviruses have a 5.5-10.6 kb circular ssDNA genome [86].The inovirus genome replicates via a rolling-circle mechanism and encodes 7-15 proteins [86].Inoviruses have the capability to integrate themselves into host genomes and cause chronic infection cycles [85].Additionally, inoviruses can directly and indirectly impact the toxicity of known pathogenic bacteria, including Vibrio cholerae, Pseudomonas, Neisseria, and Ralstonia [87].A few specific inoviruses have been extensively studied and used in a variety of genetic engineering applications due to their smaller genome size and uniquely filamentous virion [85].Yet, the majority of inoviruses remain uncharacterized as emphasized in the works of Roux et al. (2019) and Tisza et al. (2021) who discovered thousands of inovirus-like sequences across existing genomes and metagenomes [10,85].Our work here adds to growing efforts to understand inovirus prevalence, diversity, and function in humans using metagenomics.
Based on BLASTn, all 19 inoviruses have the highest nucleotide sequence similarity, ranging from 72-100% nucleotide identity (with 21-100% query cover), with the inoviruses identified from the metagenomes of human oral samples [10].Additionally, the predicted bacterial hosts, as determined in the work of Tisza et al. (2021), of the closest BLASTn hits of the 19 inoviruses identified in this study include the genera Neisseria (n = 13), Aggregatibacter (n = 3; Inovirus D_HF2_82, OR148981; inovirus D_HF2_144, OR148968; inovirus D_HS32_91, OR148977), and Mannheimia (n = 1; Inovirus D_HF5_61, OR148978) [10].While most members of Neisseria are commensal, two species of the Neisseria genus are opportunistic pathogens responsible for cases of meningitis, septicemia, and gonorrhea in humans [89].Bacteria within the Neisseria genus have iron-regulated proteins related to the RTX toxin superfamily [89,90].Bacteria in the Aggregatibacter genus can contribute to periodontal disease, particularly in children and adolescents [91].One of the main virulence factors of bacteria within Aggregatibacter is a leukotoxin capable of causing extensive damage to human immune tissues [91].Although bacteria in the Mannheimia genus are infrequently associated with disease, some species can cause pneumonia and septicemia in domestic animals and have been isolated from septicemia and wound infections in humans [92].Similar to Aggregatibacter, Mannheimia's most important virulence factor is a leukotoxin able to cause cell lysis and death [93].As previous studies have shown inoviruses' ability to impact the toxicity of toxin-producing bacterial genera, inoviruses likely serve a crucial role in controlling the function of pathogenic and commensal bacteria of the human oral virome.

Conclusions
High-throughput sequencing and viral metagenomic workflows are innovative tools to help identify diverse novel and known viruses across complex viral phyla.In this study, we successfully de novo assembled 64 complete genomes of viruses across Anelloviridae, Cressdnaviricota, Caudoviricetes, Microviridae, and Inoviridae, representing 55 species in only 14 saliva samples from healthy individuals in Durham, North Carolina (USA).Two of the

Conclusions
High-throughput sequencing and viral metagenomic workflows are innovative tools to help identify diverse novel and known viruses across complex viral phyla.In this study, we successfully de novo assembled 64 complete genomes of viruses across Anelloviridae, Cressdnaviricota, Caudoviricetes, Microviridae, and Inoviridae, 55 species in only 14 saliva samples from healthy individuals in Durham, North Carolina (USA).Two of the complete anellovirus genomes represent the species Alphatorquevirus homin24, and the third anellovirus falls within a lineage of unclassified betatorqueviruses.Four of the five redondoviruses are parts of the species Vientovirus, and one is part of Brisavirus.In sample Duke_HF1, we identified redondoviruses, redondovirus D_HF1_1 (OR148956) and redondovirus D_HF1_3 (OR148957), which are members of two species, and in sample Duke_HF5, we identified two variants, redondovirus D_HF5_1 (OR148962) and redondovirus D_HF5_2R (OR148963), of the species Vientovirus.This shows that multiple variants and species of redodonviruses are circulating in individuals.
All four unclassified cressdnaviruses represent new species within Cressdnaviricota.One unclassified cressdnavirus, cressdnavirus D_HF4_1794 (OR148960), has high similarity with uncultured virus CG233 (KY487902) and CG269 (KY487938) identified from a wastewater sample from Florida (USA) [65], showing that this virus is shed via the fecal route.All four new unclassified cressdnaviruses were identified in a single saliva sample (Duke_HF4).Four Caudoviricetes phages were identified, all representing new species.The 29 microvirus genomes show high BLASTn similarity to microvirus sequences from human metagenomes, the human nasopharyngeal cavity, tortoise feces, minnow tissue, freshwater, wastewater, and a tunicate intestinal tract.The nine genomes fall into lineages of the ViPTree-generated proteome phylogenetic tree.
Overall, the number of complete virus genomes determined, with many of these viruses representing new species, shows that the human oral virome is relatively understudied.Although many of these viruses infect microbes within the human body and not human cells, viruses such as Caudoviricetes phages, microviruses, and inoviruses (which can impact the toxicity of bacterial pathogens) are likely key influencers of bacterial abundance and behavior within the human oral cavity.These bacteriophages are directly impacting bacteria that influence human immunity and health.The discovery of known and new viruses in microbe-infecting phyla is therefore vital for understanding how widespread that impact may be.For all the viral phyla and families studied, we are only beginning to understand their vast diversity and global prevalence even within the human body.The work presented here significantly contributes to our global understanding of viruses in the human oral virome and displays the surprising levels of novel viral diversity that can be revealed from 14 saliva samples.Lastly, this study supports the utility of passive drool sampling in conjunction with metagenomics for effective discovery across diverse viral groups.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/v15091821/s1, Figure S1  Informed Consent Statement: Informed consent was obtained from all subjects involved in this study.

Figure 1 .
Figure 1.Genome coverage plot based on read mapping to vOTUs of viruses identified in this study depicting the presence of all vOTUs across all 14 human saliva samples.Black squares represent 50-100% genome coverage, gray squares represent 25-50% genome coverage, and white squares represent 0-25% genome coverage.Greater than 50% read coverage was used as a high-confidence proxy of vOTU presence in any particular sample.The bar plot on the right depicts the number of samples

Figure 1 .
Figure 1.Genome coverage plot based on read mapping to vOTUs of viruses identified in this study depicting the presence of all vOTUs across all 14 human saliva samples.Black squares represent 50-100% genome coverage, gray squares represent 25-50% genome coverage, and white squares represent 0-25% genome coverage.Greater than 50% read coverage was used as a high-confidence proxy of vOTU presence in any particular sample.The bar plot on the right depicts the number of samples that each vOTU is present in.The bar plot at the bottom of the plot represents the number of vOTUs present in each saliva sample examined.

Figure 2 .
Figure 2. (A) Linearized genome organization of the three anellovirus genomes identified in this study.(B) Maximum likelihood phylogenetic tree of the ORF1 amino acid sequences of Alphatorquevirus and Betatorquevirus with Gammatorquevirus serving as the outgroup.aLRT branch support values are denoted by numbers at each node, and branches with values <0.7 aLRT branch support have been collapsed.Virus sequences identified in this study are in red font within the Alphatorquevirus and Betatorquevirus genera.

Figure 3 .
Figure 3. (A) Linearized genome organization of the cressdnaviruses identified in this study.(B) Maximum likelihood phylogenetic tree of the Rep sequences of viruses in the phylum Cressdnaviricota separated into family-level clustering.Family-level clusters that include viruses characterized in this study are colored in red.Cressdnavirus D_HF4_1353 (OR148958), an unclassified cressdnavirus depicted as its own line, falls within Cressdnaviricota but does not fit within the current familylevel clusters.Branches with aLRT branch support values <0.8 were collapsed.

Figure 3 .
Figure 3. (A) Linearized genome organization of the cressdnaviruses identified in this study.(B) Maximum likelihood phylogenetic tree of the Rep sequences of viruses in the phylum Cressdnaviricota separated into family-level clustering.Family-level clusters that include viruses characterized in this study are colored in red.Cressdnavirus D_HF4_1353 (OR148958), an unclassified cressdnavirus depicted as its own line, falls within Cressdnaviricota but does not fit within the current family-level clusters.Branches with aLRT branch support values <0.8 were collapsed.

Figure 4 .
Figure 4. Maximum likelihood phylogenetic tree of the Rep amino acid sequences of viruses in the family Redondoviridae.The two species in the genus Torbevirus of the family Redondoviridae, Brisavirus and Vientovirus, are in shaded rectangles.Sample information including country in which the sample was taken and source the sample was collected from is depicted in the figure with colored squares next to each accession number and source silhouettes.aLRT branch support values are denoted by numbers at each node, and branches with values <0.7 were collapsed.Virus sequences

Figure 4 .
Figure 4. Maximum likelihood phylogenetic tree of the Rep amino acid sequences of viruses in the family Redondoviridae.The two species in the genus Torbevirus of the family Redondoviridae, Brisavirus and Vientovirus, are in shaded rectangles.Sample information including country in which the sample was taken and source the sample was collected from is depicted in the figure with colored squares next to each accession number and source silhouettes.aLRT branch support values are denoted by numbers at each node, and branches with values <0.7 were collapsed.Virus sequences identified in this study are highlighted in red font.Please refer to Figure 3B to view the placement of redondoviruses in the Rep-based cressdnavirus phylogenetic tree.

Figure 5 .
Figure 5. Maximum likelihood phylogenetic tree of the Rep sequences of unclassified family-level clusters within the phylum Cressdnaviricota, i.e., CRESS6 and Cluster 1 (Figure 3).Virus sequences identified in this study are highlighted in red font.Branches with <0.7 aLRT support have been collapsed.Please refer to Figure 3B to view the placement of CRESS6 and Cluster 1 in the Rep-based cressdnavirus phylogenetic tree.

Figure 5 .
Figure 5. Maximum likelihood phylogenetic tree of the Rep sequences of unclassified family-level clusters within the phylum Cressdnaviricota, i.e., CRESS6 and Cluster 1 (Figure 3).Virus sequences identified in this study are highlighted in red font.Branches with <0.7 aLRT support have been collapsed.Please refer to Figure 3B to view the placement of CRESS6 and Cluster 1 in the Rep-based cressdnavirus phylogenetic tree.

Figure 6 .
Figure 6.Maximum likelihood phylogenetic tree of the Rep sequences of an unclassified cluster within Cressdnaviricota, Cluster 2 (Figure 3).The virus characterized in this study is depicted in red font.Branches with <0.7 aLRT support have been collapsed.Please refer to Figure 3B to view the placement of Cluster 2 in the Rep-based cressdnavirus phylogenetic tree.

Figure 6 .
Figure 6.Maximum likelihood phylogenetic tree of the Rep sequences of an unclassified cluster within Cressdnaviricota, Cluster 2 (Figure 3).The virus characterized in this study is depicted in red font.Branches with <0.7 aLRT support have been collapsed.Please refer to Figure 3B to view the placement of Cluster 2 in the Rep-based cressdnavirus phylogenetic tree.

Figure 7 .
Figure 7. Proteomic trees and annotations of dsDNA phage genomes.The large phages characterized in this study fall primarily in three clades labeled (A-C).Virus genomes identified in this study are highlighted in red font and starred.The bacterial host group that the phage is predicted to infect is specified in color to the left of the accession information of each included virus.Genome annotations are depicted below each proteomic tree.

Figure 7 .
Figure 7. Proteomic trees and annotations of dsDNA phage genomes.The large phages characterized in this study fall primarily in three clades labeled (A-C).Virus genomes identified in this study are highlighted in red font and starred.The bacterial host group that the phage is predicted to infect is specified in color to the left of the accession information of each included virus.Genome annotations are depicted below each proteomic tree.

30 Figure 8 .
Figure 8. Linearized genome annotations of microviruses identified in this study.

Figure 8 .
Figure 8. Linearized genome annotations of microviruses identified in this study.

Figure 9 .
Figure 9. Maximum likelihood phylogenetic tree of the Microviridae family.The tree branches are colored by sub-families and putative sub-families.Human-derived microviruses from previous studies (gray) and this study (pink) are denoted as short lines around the outer edge of the circular phylogeny.

Figure 9 .
Figure 9. Maximum likelihood phylogenetic tree of the Microviridae family.The tree branches are colored by sub-families and putative sub-families.Human-derived microviruses from previous studies (gray) and this study (pink) are denoted as short lines around the outer edge of the circular phylogeny.

Figure 10 .
Figure 10.Linearized genome annotations of inoviruses identified in this study.

Figure 11 .
Figure 11.Proteomic tree of viruses in the Inoviridae family.Viruses have been placed into two distinct clades within the Inoviridae family, labeled as Clades (A) (blue) and (B) (green).Inoviruses identified in this study are highlighted in red font, and those that have been classified are highlighted in blue font with species names in italics.

Funding:
The work described here was supported by TriCEM (Triangle Center for Evolutionary Medicine), Duke Biology, Duke Lemur Center, and Sigma Xi grants awarded to E.N.P. Institutional Review Board Statement: This study was conducted in accordance with the Declaration of Helsinki and approved by the Duke University Campus Institutional Review Board (IRB Protocol # 2022-0009, date of approval 28 July 2021).

Table 1 .
Overview of viruses characterized by this study.All viral genomes were deposited in GenBank with accession numbers as shown below.

Table 2 .
Summary of microviruses identified in this study and their percent identity to the closest related microvirus according to BLASTn.