Diverse Small Circular DNA Viruses Identified in an American Wigeon Fecal Sample

American wigeons (Mareca americana) are waterfowls that are widely distributed throughout North America. Research of viruses associated with American wigeons has been limited to orthomyxoviruses, coronaviruses, and circoviruses. To address this poor knowledge of viruses associated with American wigeons, we undertook a pilot study to identify small circular DNA viruses in a fecal sample collected in January 2021 in the city of Tempe, Arizona (USA). We identified 64 diverse circular DNA viral genomes using a viral metagenomic workflow biased towards circular DNA viruses. Of these, 45 belong to the phylum Cressdnaviricota based on their replication-associated protein sequence, with 3 from the Genomoviridae family and the remaining 42 which currently cannot be assigned to any established virus group. It is most likely that these 45 viruses infect various organisms that are associated with their diet or environment. The remaining 19 virus genomes are part of the Microviridae family and likely associated with the gut enterobacteria of American wigeons.


Introduction
American wigeons (Mareca americana) are a widely distributed species of migratory waterfowl in North America [1].Mareca americana is one of three species of wigeons, the other two being M. penelope and M. sibilatrix in the family Anatidae.American wigeons in the Pacific flyway overwinter in the southwest of North America, which includes the states of Arizona, California, and parts of Mexico.During the late spring/summer months, they migrate to parts of Canada and Alaska [2,3].American wigeons feed in shallow bodies of freshwater (e.g., ponds, lakes, and marshes), and their diet primarily consists of plants and some insects [4].American wigeons generally inhabit the same environments as other dabbling ducks, such as northern shovelers (Spatula clypeata), mallards (Anas platyrhynchos), green-winged teal (Anas carolinensis), northern pintails (Anas acuta), and gadwalls (Mareca strepera).
Genomoviruses are classified at a species level based on genome-wide pairwise identities.
Given the limited information on viruses associated with American wigeons, we undertook a pilot metagenomic study to identify circular DNA viruses in a fecal sample collected in Tempe, Arizona (USA).We identified 3 genomoviruses, 42 unclassified cressdnaviruses, and 19 microviruses.

Fecal Sampling and High-Throughput Sequencing
An American wigeon fecal sample was collected on 13 January 2021 at Kiwanis Park, Tempe, Arizona, USA.The sample was collected using a sterile tongue depressor following a visual observation of an American wigeon defecating and then placed into a 2 mL tube.It was stored in a −20 • C freezer until processing.The fecal sample (1 g) was homogenized in 2 mL of SM buffer.The homogenate was centrifuged at 10,000× g for 10 min, and the supernatant was sequentially filtered through 0.45 µm and 0.2 µm syringe filters.In total, 200 µL of this filtrate was used to extract viral DNA using the High Pure Viral Nucleic Acid Kit (Roche, USA) following the manufacturer's instructions.Circular DNA in the viral DNA extract was amplified using rolling circle amplification (RCA) with the Templiphi 100 amplification kit (GE Healthcare, USA).The RCA products were used to generate Illumina sequencing libraries using the DNA TrueSeq Nano kit, and they were sequenced on an Illumina Hiseq4000 sequencer (Illumina, USA) at Psomagen Inc. (USA).

Sequence Assembly and Identification of Viral Contigs
The pair-end reads (2 × 150 nts) were trimmed using Trimmomatic v0.39 [33].The resulting paired-end reads were then de novo assembled using MEGAHIT v1.2.9 [34], and contigs > 1000 nts in length were screened using BLASTx [35] against a viral RefSeq protein sequence database (release 207) for viral-like sequences.All contigs with terminal redundancy were determined to represent circular genomes.All circular genomes that appeared to be eukaryote-infecting viruses were annotated using ORFfinder (ncbi.nlm.nih.gov/orffinder/, accessed on 1 October 2023) coupled with manual checks.Prokaryoteinfecting circular DNA viruses were annotated using VIBRANT [36].
Since multiple studies have identified some cressdnaviruses as reagent/kit contaminants [20, [37][38][39][40], to identify any reagent-associated viruses or those misidentified as a result of barcode-hopping artifacts, we mapped all the reads from all the samples processed at the same time/run on the same lane to the virus genomes identified here using BBMap [41].
We extracted Rep protein sequences from the representative Rep sequence dataset used for SSN analysis that cluster with those from this study, as well as those from the established viral cressdnavirus families and CRESS Groups 1-6.These were collectively aligned with MAFFT v7.113 [50], and the resulting alignment was trimmed using TrimAL with a gap threshold of 0.2 [51].A maximum likelihood phylogenetic tree was constructed using IQTree v2.1.3[52] with a Q.pfam + F + G4 substitution model identified as the best-fit model and with approximate likelihood ratio test (aLRT) branch support [53] inferred from the trimmed alignment.The maximum likelihood phylogenetic tree was visualized with iTOL v6 [54].
The Rep amino acid sequences form the Genomoviridae family and unclassified cressdnavirus clusters (CRESSV2, CRESSV6, and Clusters A-Q) were individually aligned using MAFFT v7.113 AUTO mode [50] with appropriate outgroups based on the large Rep phylogenetic tree.Cluster-level alignments were used to determine the best-fit amino acid substitution model using ProtTest3 [55], and maximum likelihood trees were inferred with these models and PhyML3 [56].In the resulting trees, branches with <0.80 aLRT branch support [53] were collapsed in TreeGraph2 [57].All pairwise identities were determined using SDTv1.2[58].

Analyses of Microviruses
Major capsid protein (MCP) sequences were extracted from the genome sequences of microviruses and assembled into a dataset that contained 3641 known MCP sequences.The MCP sequences were translated and aligned with those from the study using the MAFFT v7.113 AUTO mode [50].The resulting alignments were trimmed using TrimAl v1.2 with the gappyout option [51].The trimmed alignment was used to infer a maximum likelihood phylogenetic tree IQTree v2.1.3using the best-fit model [52] and visualized with iTOL v6 [59].

Identification of Viral Genomes
The de novo assemblies resulted in 3538 contigs with a size range of 200-66,203 nts.Of these, 1228 were >1000 nts.Of these, 672 were identified to be viral-like based on BLASTx analysis representing viruses in phyla Cressdnaviricota (n = 82), Hofneiviricota (n = 10), Nucleocytoviricota (n = 45), Phixviricota (n = 24) and Uroviricota (n = 511).Of all of these, 64 contigs with similarities to viruses in Cressdnaviricota (n = 45) and Phixviricota (n = 19) were identified to have terminal redundancies and thus determined as ones representing complete circular genomes.No raw reads mapping to these contigs were found in any of the other sample libraries processed at the same time in the lab and run on the same flow cell based on our mapping analysis using BBMap [41].
For this study, we focus on the complete genomes (Figure 1).Three of the 45 cressdnaviruses are part of the family Genomoviridae and the rest cluster (based on the Rep sequence similarity network) with sequences of unclassified cressdnaviruses.Collectively, the Reps of these cressdnaviruses are part of 1 classified cluster (genomoviruses) and 19 unclassified clusters (CRESSV2, CRESSV6, clusters A-Q), and 10 are singletons (Figure 2).The 19 phixviruses are part of the family Microviridae.
For this study, we focus on the complete genomes (Figure 1).Three of the 45 cressdnaviruses are part of the family Genomoviridae and the rest cluster (based on the Rep sequence similarity network) with sequences of unclassified cressdnaviruses.Collectively, the Reps of these cressdnaviruses are part of 1 classified cluster (genomoviruses) and 19 unclassified clusters (CRESSV2, CRESSV6, clusters A-Q), and 10 are singletons (Figure 2).The 19 phixviruses are part of the family Microviridae.A summary of the BLASTn [35] analysis of the genomes identified here is provided in Table 1.With the exception of three viruses, i.e., wigfec virus K19_469 (OP549795), wigfec virus K19_561 (OP549839) and wigfec virus K19_141 (OP549803) which share >70% pairwise identity with >80% genome coverage, all others are relatively diverse.

Genomoviruses
The genomoviruses identified in this study range in size from 2200 to 2375 nts and encode a CP and a Rep in an ambisense orientation [24].The three genomoviruses identified in this study belong to three different genera with wigfec virus K19_435 (OP549796) in Gemykibivirus, wigfec virus K19_469 (OP549795) in Gemyduguivirus and wigfec virus K19_482 (OP549794) in Gemycircularvirus (Figure 3).The conserved rolling circle replication motif (RCR), geminivirus Rep-like sequences (GRS), and Superfamily 3 (SF3) helicase motifs are present in all the Reps of wigfec genomoviruses (Table 2).
Wigfec virus K19_469 (OP549795) is most similar to Genomoviridae sp.D2_1183 (MW678959), isolated from dust particles in Arizona [60], which is not classified at a species level sharing 98% genome-wide nucleotide pairwise identity and 100% Rep amino acid identity (Table 3).Given this virus was also detected in Arizona, it may be that it infects a commonly detected fungus in Arizona.Wigfec virus K19_435 (OP549796) is most similar to Cybaeus spider-associated circular virus 2 BC_I1644B_C3 (MH545507) [61] which belongs to species Gemykibivirus cybusi1, sharing 51% genome-wide nucleotide pairwise identity.Its Rep shares 60% amino acid identity and clusters with other members of species Gemykibivirus cynas1 and Gemykibivirus raski1 (Figure 3).Wigfec virus K19_482 (OP549794) is most similar to gemycircularvirus gemy-ch-rat1 (KR912221), identified from a rat [62], which is part of species Gemycircularvirus ratas1, sharing 51% genome-wide nucleotide pairwise identity and 38% Rep amino acid identity (Table 3).
Wigfec virus K19_435, wigfec virus K19_482, and wigfec virus K19_469 with Genomoviridae sp.D2_1183 represent three new species based on the previously established 78% genome-wide pairwise identity species demarcation threshold for genomoviruses [23].All the three genomoviruses identified here are likely fungal-infecting viruses based on what is known of two of the fungal hosts (Sclerotinia sclerotiorum and Fusarium graminearum) [26,27], specific genomviruses in species Gemycircularvirus sclero1 and Gemytripvirus fugra1 [24].

Unclassified Cressdnaviruses
Forty-two cressdnaviruses (size range 1665-3789 nts) could not be assigned to any established cressdnavirus family (Figures 1 and 2).Based on SSN analysis, the Reps of 10 cressdnaviruses are singletons, and 32 cluster with other known Reps within 19 unique clusters (Figure 2).This highlights the diversity of these cressdnaviruses within a single fecal sample.Rep amino acid phylogenetic analysis for each cluster with >2 sequences is undertaken.In the Reps of all these 42 cressdnaviruses, we identify the conserved RCR and SF3 helicase motifs (Table 2).Additionally, in the Reps of wigfec virus K19_467 (OP549797), wigfec virus K19_484 (OP549821), wigfec virus K19_494 (OP549823), and wigfec virus K19_493 (OP549851), which are all part of Cluster J, and wigfec virus K19_486 (OP549822) which is part of Cluster K, we identified a GRS domain (Table 2).The GRS domain in the Rep of wigfec virus K19_486 appears to have a five-residue insertion (DGTVY) (Table 2).CRESSV1-6 have previously been described as unique family level groupings [43].Five of the viruses identified in this study (wigfec virus K19_426 (OP549818), wigfec virus K19_588 (OP549828), wigfec virus K19_292 (OP549833), wigfec virus K19_555 (OP549837) and wigfec virus K19_645 (OP549843) are part of CRESSV2 (Figure 4), and they share ~32-40% amino acid identity and <57% amino acid identity with the Reps of all other viruses in cluster CRESSV2 and are distributed throughout the CRESSV2 Rep phylogeny (Figure 4).The maximum likelihood phylogenetic tree was inferred using PhyML 3 [56] and rooted with Rep sequences of geminiviruses with LG + G + I as best-fit models determined using ProtTest 3 [55].All sequences from this study are highlighted in blue font, and for gemycircularviruses, a zoomed-in section of phylogeny is shown.The maximum likelihood phylogenetic tree was inferred using PhyML 3 [56] and rooted with Rep sequences of geminiviruses with LG + G + I as best-fit models determined using ProtTest 3 [55].All sequences from this study are highlighted in blue font, and for gemycircularviruses, a zoomed-in section of phylogeny is shown.The Reps of wigfec virus K19_426, wigfec virus K19_588, wigfec virus K19_292, wigfec virus K19_555, and wigfec virus K19_645 are most similar to those of Diporeia sp.associated circular virus LM3487 (KC248416) [63], Antarctic circular DNA molecule COCH21_V_94 (MN328284) [64], uncultured virus CG261 (KY487930) [65], sewage-associated circular DNA and virus-20 NZ-BS3900-2012 (KM821755) [66], and Cressdnaviricota sp.ctdb97 (MH510276) [20], sharing 46%, 57%, 46%, 53%, and 56% amino acid identity, respectively (Table 3).Wigfec virus K19_450 (OP549820) is part of CRESSV6 (Figure 5).The Rep of wigfec virus K19_450 (OP549820) shares a pairwise amino acid identity of 49.4% with that of Circovirus-like DCCV-2 (KT149395) identified from a freshwater lake in China and phylogenetically forms a clade with it, as well (Figure 5, Table 3).The maximum likelihood phylogenetic trees of each cluster were inferred using PhyML 3 [56] with LG + I + G for the CRESSV2 cluster and LG + I + G + F for Clusters A, B, and C as best-fit models determined using ProtTest 3 [55].All sequences from this study are highlighted in blue font.
The Rep of wigfec virus K19_668 (OP549845) is part of Cluster A and shares 51% amino acid identity and clustering with Arizlama virus isolate AZLM_1011(MW697465), which was detected in a lake sample from Arizona (Figure 5).The Reps of wigfec virus The maximum likelihood phylogenetic trees of each cluster were inferred using PhyML 3 [56] with LG + I + G for the CRESSV2 cluster and LG + I + G + F for Clusters A, B, and C as best-fit models determined using ProtTest 3 [55].All sequences from this study are highlighted in blue font.
Microorganisms 2024, 12, x FOR PEER REVIEW 14 of 25 K19_562 (OP549827) and wigfec virus K19_691 (OP549846) cluster and that of uncultured virus CG267 (KY487936) [65] share < 44% amino acid identity (Figure 5, Table 3).The Rep of wigfec virus K19_571 (OP549840) clusters with the Reps of five viruses in Cluster C share ~46-64% amino acid identity, and it is most closely related to that of Virus sp.isolate D12_1244 (MW678878) [60].The Reps of wigfec virus K19_593 (OP549841), wigfec virus K19_558 (OP549838), and wigfec virus K19_432 (OP549819) are part of Clusters D, E, and F, respectively (Figures 6 and 7).Their Reps share the highest similarity of 45%, 72%, and 51% amino acid identity with Reps of Cressdnaviricota sp.ctcd610 (MH649031) [20], Sewage-associated circular DNA virus-17 (KM821752) and Avon-Heathcote Estuary-associated circular virus 26 NZ-2311TU-2012 (KM874359) [14], respectively (Table 3).The maximum likelihood phylogenetic trees of each cluster were inferred using PhyML 3 [56] with LG + I + G + F for Cluster D cluster and RtRev + I + G + F for Cluster E as best-fit models determined using ProtTest 3 [55].All sequences from this study are highlighted in blue font.The maximum likelihood phylogenetic trees of each cluster were inferred using PhyML 3 [56] with LG + I + G + F for Cluster D cluster and RtRev + I + G + F for Cluster E as best-fit models determined using ProtTest 3 [55].All sequences from this study are highlighted in blue font.In Cluster G, the genome of wigfec virus K19_561 (OP549839) shares ~90% similarity with the genome of Chicken circovirus 4 CCV-4 (MN428454) identified in the stomach of a red junglefowl (Gallus gallus), from southeast Asia [67] (Table 3).Their Reps share 98.6% amino acid identity.This virus is the only unclassified cressdnavirus that has high similarity to a previously identified virus.Furthermore, the Rep of wigfec virus K19_521 (OP549825) shares ~63% with chicken circovirus 2 CCV-2 (MN420497), also from red junglefowl [67] (Table 3).The Rep of wigfec virus K19_525 (OP549836) clusters with the Reps of wigfec virus K19_561 and Chicken circovirus 4 CCV-4, sharing ~59% amino acid identity (Figure 8).Given that several of these circovirus-like genomes have been detected in two bird species, it may be that this is an avian virus or infects an organism that is commonly associated with avian species.
Microorganisms 2024, 12, x FOR PEER REVIEW 16 of 25 PhyML 3 [56] with LG + I + G as best-fit model determined using ProtTest 3 [55] and rooted with Rep sequences from the redondoviruses.The sequence from this study is highlighted in blue font.

Microviruses
We identified 19 microviruses that range in size from 4182 to 6389 nts.All of the 19 microviruses encode a major capsid protein (MCP) and a replication initiator protein.
The MCP phylogeny reveals that those from this study are broadly distributed across several clades, with eight in the subfamily of Gokushovirinae, three in proposed putative sub-family clade Alpavirinae, and four in the Pichovirinae (Figures 1 and 11) [69].Four of the identified microviruses fall outside of these (Figures 1 and 11).The MCPs of these viruses in general share 38-77% highest amino acid identity with those of microviruses identified from various environments (Table 4).These microviruses likely infect the gut enterobacteria of the American wigeon, and they all represent new species based on the 95% species threshold used for bacteriophage, as these genomes share < 88% genome-wide identity with all other microvirus genomes in GenBank.  in general share 38-77% highest amino acid identity with those of microviruses identified from various environments (Table 4).These microviruses likely infect the gut enterobac teria of the American wigeon, and they all represent new species based on the 95% species threshold used for bacteriophage, as these genomes share < 88% genome-wide identity with all other microvirus genomes in GenBank.

Conclusions
American wigeons play a vital ecological role in wetland ecosystems across North America.These birds travel hundreds of kilometers during their migration seasons and can provide insight into viral diversity due to their interactions across different habitats.We identified 42 unclassified cressdnavirus, 3 genomovirus, and 19 microvirus genomes through our non-invasive fecal sampling approach from one sample.The unclassified cressdnaviruses identified from this study are diverse.The three members of the Genomoviridae family are part of three different genera: Gemykibivirus, Gemyduguivirus, and Gemycircularvirus.These genomoviruses most likely infect fungi associated with American wigeons; however, in general, little is known about their host range.In total, 10 cressdnaviruses are singletons, and 32 cluster into 20 family-level groups.In addition, 3 cressdnaviruses, wigfec virus K19_521, wigfec virus K19_467, and wigfec virus K19_561, are most similar to genomes detected from avian samples, and wigfec virus K19_469 is most similar to a Gemyduguivirus from airborne dust particles; however, the rest are diverse viruses.In general, all these 42 unclassified cressdnaviruses each likely represent at least 40 new species of viruses, as they share < 80% genome-wide identity with other virus genomes in GenBank.The 19 microviruses we identified most likely infect the gut microbiota of the American wigeon and these all represent 19 new species.This pilot study highlights the diverse viral community within just a single fecal sample of an American wigeon.Although we cannot determine whether any of the eukaryote-infecting viruses we identified in this study infect the American wigeon, they expand our knowledge on diversity of ssDNA viruses, and with more studies, we will be able to start understanding the ecology of these viruses.

Figure 1 .
Figure 1.Summary of the genomes of cressdnaviruses (A) and microviruses (B) identified from the American wigeon fecal sample.Circular genomes are shown in a linear representation.Figure 1. Summary of the genomes of cressdnaviruses (A) and microviruses (B) identified from the American wigeon fecal sample.Circular genomes are shown in a linear representation.

Figure 1 .
Figure 1.Summary of the genomes of cressdnaviruses (A) and microviruses (B) identified from the American wigeon fecal sample.Circular genomes are shown in a linear representation.Figure 1. Summary of the genomes of cressdnaviruses (A) and microviruses (B) identified from the American wigeon fecal sample.Circular genomes are shown in a linear representation.

Figure 2 .
Figure 2. The Rep amino acid maximum likelihood phylogenetic tree inferred with IQTree2 (Minh et al., 2020) [52] with Q.pfam + F + G4 substitution model identified as the best-fit model for the viruses in the Cressdnaviricota phylum.The family-level clustering for unclassified CRESS groups was determined by sequence similarity networks (SSN) of the amino acid sequences of the cressdnavirus Rep with a sequence similarity score of 60 using EFI-EST [44] and visualized with Cytoscape

Figure 2 .
Figure 2. The Rep amino acid maximum likelihood phylogenetic tree inferred with IQTree2 (Minh et al., 2020) [52] with Q.pfam + F + G4 substitution model identified as the best-fit model for the viruses in the Cressdnaviricota phylum.The family-level clustering for unclassified CRESS groups was determined by sequence similarity networks (SSN) of the amino acid sequences of the cressdnavirus Rep with a sequence similarity score of 60 using EFI-EST [44] and visualized with Cytoscape v3.8.2 [49].The Reps identified from this study are shown in blue and are grouped into the Genomoviridae family, 20 family-level clusters (CRESSV2, CRESSV6, A-Q), and 10 singletons.

Figure 3 .
Figure 3. Maximum likelihood phylogenetic relationship of the Rep protein sequences of representative sequences (species-level) of viruses in genera Gemyduguivirus, Gemykibivirus, and Gemycircularvirus.The maximum likelihood phylogenetic tree was inferred using PhyML 3[56] and rooted with Rep sequences of geminiviruses with LG + G + I as best-fit models determined using ProtTest 3[55].All sequences from this study are highlighted in blue font, and for gemycircularviruses, a zoomed-in section of phylogeny is shown.

Figure 3 .
Figure 3. Maximum likelihood phylogenetic relationship of the Rep protein sequences of representative sequences (species-level) of viruses in genera Gemyduguivirus, Gemykibivirus, and Gemycircularvirus.The maximum likelihood phylogenetic tree was inferred using PhyML 3[56] and rooted with Rep sequences of geminiviruses with LG + G + I as best-fit models determined using ProtTest 3[55].All sequences from this study are highlighted in blue font, and for gemycircularviruses, a zoomed-in section of phylogeny is shown.

Figure 4 .
Figure 4. Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in clusters CRESSV2.The maximum likelihood phylogenetic tree was inferred using PhyML 3 [56] with VT + I + G as best-fit model determined using ProtTest 3 [55] and rooted with Rep sequences from the CRESSV5 cluster.Sections of phylogeny are zoomed in to show the details in relation to the Reps from this study of the viruses that are part of the CRESSV2 cluster.All sequences from this study are highlighted in blue font.

Figure 4 .
Figure 4. Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in clusters CRESSV2.The maximum likelihood phylogenetic tree was inferred using PhyML 3 [56] with VT + I + G as best-fit model determined using ProtTest 3 [55] and rooted with Rep sequences from the CRESSV5 cluster.Sections of phylogeny are zoomed in to show the details in relation to the Reps from this study of the viruses that are part of the CRESSV2 cluster.All sequences from this study are highlighted in blue font.

Figure 5 .
Figure 5. Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Clusters CRESSV6 (rooted with redondovirus Rep sequences) and Clusters A, B, and C (rooted with CRESSV5 Rep sequences).The maximum likelihood phylogenetic trees of each cluster were inferred using PhyML 3[56] with LG + I + G for the CRESSV2 cluster and LG + I + G + F for Clusters A, B, and C as best-fit models determined using ProtTest 3[55].All sequences from this study are highlighted in blue font.

Figure 5 .
Figure 5. Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Clusters CRESSV6 (rooted with redondovirus Rep sequences) and Clusters A, B, and C (rooted with CRESSV5 Rep sequences).The maximum likelihood phylogenetic trees of each cluster were inferred using PhyML 3[56] with LG + I + G for the CRESSV2 cluster and LG + I + G + F for Clusters A, B, and C as best-fit models determined using ProtTest 3[55].All sequences from this study are highlighted in blue font.

Figure 6 .
Figure 6.Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Clusters D and E (both rooted with CRESSV5 Rep sequences).The maximum likelihood phylogenetic trees of each cluster were inferred using PhyML 3[56] with LG + I + G + F for Cluster D cluster and RtRev + I + G + F for Cluster E as best-fit models determined using ProtTest 3[55].All sequences from this study are highlighted in blue font.

Figure 6 .
Figure 6.Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Clusters D and E (both rooted with CRESSV5 Rep sequences).The maximum likelihood phylogenetic trees of each cluster were inferred using PhyML 3[56] with LG + I + G + F for Cluster D cluster and RtRev + I + G + F for Cluster E as best-fit models determined using ProtTest 3[55].All sequences from this study are highlighted in blue font.

Figure 7 . 7 .
Figure 7. Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Cluster F. The maximum likelihood phylogenetic tree was inferred using Figure 7. Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Cluster F.The maximum likelihood phylogenetic tree was inferred using PhyML 3[56] with LG + I + G as best-fit model determined using ProtTest 3[55] and rooted with Rep sequences from the redondoviruses.The sequence from this study is highlighted in blue font.

Figure 8 .
Figure 8. Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Cluster G.The maximum likelihood phylogenetic tree was inferred using PhyML 3[56] with RtRev + I + G + F as best-fit model determined using ProtTest 3[55] and rooted

Figure 8 .
Figure 8. Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Cluster G.The maximum likelihood phylogenetic tree was inferred using PhyML 3[56] with RtRev + I + G + F as best-fit model determined using ProtTest 3[55] and rooted with Rep sequences from the redondoviruses.All sequences from this study are highlighted in blue font.

Figure 9 . 9 .
Figure 9. Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Clusters H, I, and J (all rooted with redondovirus Rep sequences).The Figure 9. Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Clusters H, I, and J (all rooted with redondovirus Rep sequences).The maximum likelihood phylogenetic trees of each cluster were inferred using PhyML 3[56] with RtRev + I + G + F for Cluster H, VT + I + G + F for Cluster I, and LG + I + G + F for Cluster J as best-fit models determined using ProtTest 3[55].All sequences from this study are highlighted in blue font.

Figure 10 .
Figure 10.Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Clusters K, L, M, N, O, P, and Q.The maximum likelihood phylogenetic trees of each cluster with Reps of geminiviruses for Cluster K, CRESSV6 for Clusters L and M, and redondoviruses for Clusters O, P, and Q, and rooting sequences were inferred using PhyML 3[56] with LG + I + G + F (Cluster K), LG + I + G (Clusters M and N), and RtRev + I + G + F (Clusters O, P, and Q) as best-fit models determined using ProtTest 3[55].All sequences from this study are highlighted in blue font.

Figure 10 .
Figure 10.Maximum likelihood phylogenetic relationship of the Rep protein sequences of unclassified cressdnaviruses in Clusters K, L, M, N, O, P, and Q.The maximum likelihood phylogenetic trees of each cluster with Reps of geminiviruses for Cluster K, CRESSV6 for Clusters L and M, and redondoviruses for Clusters O, P, and Q, and rooting sequences were inferred using PhyML 3[56] with LG + I + G + F (Cluster K), LG + I + G (Clusters M and N), and RtRev + I + G + F (Clusters O, P, and Q) as best-fit models determined using ProtTest 3[55].All sequences from this study are highlighted in blue font.

Figure 11 .
Figure 11.Maximum likelihood cladogram of the major capsid protein (MCP) sequences from mem bers of the Microviridae family inferred using IQTree2 with LG + F + G4 (Minh et al., 2020) [52] de termined as the best-fit amino acid substitution model.Branches are shown with branch support >

Figure 11 .
Figure 11.Maximum likelihood cladogram of the major capsid protein (MCP) sequences from members of the Microviridae family inferred using IQTree2 with LG + F + G4 (Minh et al., 2020) [52] determined as the best-fit amino acid substitution model.Branches are shown with branch support > 0.8 aLRT.Sub-families Bullavirinae, Gokushovirinae, putative families Alpavirinae, Parabacteroides, and Pichovirinae are shown in different-colored clades.

Table 1 .
Summary of the BLASTn analysis of the virus genomes identified in this study showing top hits with genome coverage, e-value, and percentage identity.Those with-indicate they had no BLASTn hit.

Table 2 .
Summary of the RCR and SF3 helicase motifs identified in the Reps of the cressdnaviruses from this study.

Table 3 .
[58]ary of the pairwise identity of the Rep amino acid sequences of the cressdnaviruses identified in this study with their top hits.Percentage pairwise identity determined using SDT v1.2[58].
Microorganisms 2024, 12, x FOR PEER REVIEW 17 of 25 with Rep sequences from the redondoviruses.All sequences from this study are highlighted in blue font.

Table 4 .
[58]ary of the pairwise identity of the MCP amino acid sequences of the microviruses identified in this study with their top hits.Percentage pairwise identity determined using SDT v1.2[58].