Next Article in Journal
RNA Viruses of Amblyomma variegatum and Rhipicephalus microplus and Cattle Susceptibility in the French Antilles
Next Article in Special Issue
Viromics on Honey-Baited FTA Cards as a New Tool for the Detection of Circulating Viruses in Mosquitoes
Previous Article in Journal
Characterization of Molecular Cluster Detection and Evaluation of Cluster Investigation Criteria Using Machine Learning Methods and Statewide Surveillance Data in Washington State
Previous Article in Special Issue
Virus Metagenomics in Farm Animals: A Systematic Review
Open AccessArticle

Virus Discovery in Desert Tortoise Fecal Samples: Novel Circular Single-Stranded DNA Viruses

1
School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
2
The Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ 85287, USA
3
Natural Resources Program, Naval Facilities Engineering Command-Navy Region Southwest, San Diego, CA 92101, USA, USA
4
Department of Anthropology, University of Utah, Salt Lake City, UT 84112, USA
5
Center for Evolution and Medicine, Arizona State University, Tempe, AZ 85287, USA
6
Structural Biology Research Unit, Department of Clinical Laboratory Sciences, University of Cape Town, Cape Town 7925, South Africa
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Viruses 2020, 12(2), 143; https://doi.org/10.3390/v12020143
Received: 26 October 2019 / Revised: 18 January 2020 / Accepted: 21 January 2020 / Published: 26 January 2020
(This article belongs to the Special Issue Viromics: Approaches, Advances, and Applications)

Abstract

The Sonoran Desert tortoise Gopherus morafkai is adapted to the desert, and plays an important ecological role in this environment. There is limited information on the viral diversity associated with tortoises (family Testudinidae), and to date no DNA virus has been identified associated with these animals. This study aimed to assess the diversity of DNA viruses associated with the Sonoran Desert tortoise by sampling their fecal matter. A viral metagenomics approach was used to identify the DNA viruses in fecal samples from wild Sonoran Desert tortoises in Arizona, USA. In total, 156 novel single-stranded DNA viruses were identified from 40 fecal samples. Those belonged to two known viral families, the Genomoviridae (n = 27) and Microviridae (n = 119). In addition, 10 genomes were recovered that belong to the unclassified group of circular-replication associated protein encoding single-stranded (CRESS) DNA virus and five circular molecules encoding viral-like proteins.
Keywords: Arizona; Genomoviridae; Microviridae; CRESS DNA viruses; Gopherus morafkai Arizona; Genomoviridae; Microviridae; CRESS DNA viruses; Gopherus morafkai

1. Introduction

The Sonoran Desert tortoises (Gopherus morafkai) are long-lived animals (>50 years in the wild) [1,2] adapted to the Mojave and Sonoran deserts of southwestern North America. Spending much of their time underground in burrows or shelters, including months spent brumating during winter [3,4], they interact closely with desert soils and have commensal relationships with many desert animals (e.g., ground squirrels, wood rats, snakes, and spiders) through shared use of shelters [5]. The Sonoran Desert tortoises eat a wide range of native desert grasses [6], are active in the summer monsoon [7], occupy rocky hillsides and streambeds laden with caliche caves [8], and usually use existing rock or caliche shelters [7]. They are thought to have speciated in isolation from Mojave desert tortoise (Gopherus agassizii) approximately 5 MYA when the Colorado River bisected the ancestral population and began flowing into the Gulf of California [7,8]. Differences in the timing and amount of rainfall between the Mojave and Sonoran deserts may also have led to differential adaptation between these species over the same time period [9].
The Sonoran Desert tortoise does not appear to face the health-related effects of upper respiratory tract disease observed in Mojave desert tortoise caused by infectious bacterium Mycoplasma agassizii [10], perhaps resulting from some inherently different immunological response, lower frequency of encountering the pathogen, or healthier populations due to higher genetic diversity and census size [11,12,13]. There is limited information about other pathogens infecting the tortoise family Testudinidae. So far, viruses from several families have been identified as infecting members of Testudinidae, including picornaviruses [14,15], iridoviruses [16], herpesviruses [17], adenoviruses [18,19,20,21], paramyxoviruses [22,23], and the retrovirus Rous sarcoma virus [24]. There is no information on single-stranded DNA (ssDNA) viruses associated with members of the Testudinidae family and limited viral disease information on the Sonoran Desert tortoise (G. morafkai).
We used non-invasive fecal sampling coupled with a viral metagenomic approach to identify the circular ssDNA virus diversity associated with the Sonoran Desert tortoise. We identified novel ssDNA viruses that belong to the Genomoviridae [25] and Microviridae [26] families. In addition, we identified a suite of unclassified circular replication-associated protein (Rep) encoding single-stranded (CRESS) DNA viruses DNA viruses and small ssDNA molecules encoding viral like proteins. CRESS DNA viruses include viruses in the families Bacilladnaviridae [27], Circoviridae [28], Geminiviridae [29], Genomoviridae [25], and Nanoviridae [30] and Smacoviridae [31,32]. Through viral metagenomics approaches, a larger number of novel CRESS DNA viruses have been identified that do not fall within the established viral taxonomy framework and thus are loosely referred to as ‘unclassified CRESS DNA viruses’.

2. Materials and Methods

2.1. Sample Collection and Processing

Sonoran Desert tortoise fecal samples (n = 40) were collected in the Sonoran Desert of Arizona, USA, in 2013 and 2014. Samples were collected in the field from five locations in Maricopa, Yavapai, Pinal, and La Paz Counties, stored in separate plastic bags, and frozen once returned to the field station. DNA was extracted as previously described by Steel et al. [33]. Briefly, approximately 5 g of fecal material per sample was homogenized in 20 mL SM buffer (0.1 M NaCl, 50 mM Tris-HCl [pH 7.4], 10 mM MgSO4) and centrifuged for 4700× g for 15 min. The supernatant was filtered sequentially through a 0.45 µm, and 0.22 µm syringe filter. Three grams of PEG 8000 (Sigma, St. Louis, MO, USA) was added to 15 mL of each of the filtrates, the solution was mixed gently to suspend the PEG and incubated overnight at 4 °C to precipitate virions. The mixture was centrifuged at 15,000× g for 20 min and the resulting pellet was resuspended in 1 mL of filtrate. Viral DNA was extracted from 200 µL of resuspension using the Roche High Pure Viral nucleic acid kit (Roche Diagnostics, Indianapolis, IN, USA).

2.2. Illumina Sequencing and Data Processing

Circular molecules were amplified using rolling circle amplification (RCA) with the Illustra TempliPhi 100 Amplification Kit (GE Healthcare, Chicago, IL, USA). The RCA products of the 40 samples were pooled into five groups each with 6–9 samples. These pools were used to generate a 2 × 100 bp library and sequenced on a HiSeq 4000 platform at Macrogen, Inc. (Seoul, Korea). The raw paired-end reads were trimmed using Trimmomatic v0.36 [34] and de novo assembled using metaSPAdes 3.12.0 [35]. Circular contigs were identified based on terminal sequence redundancy and all contigs >500 nucleotides (nt) were analyzed using BLASTx [36] against a local viral RefSeq protein database (complete database downloaded on 15 July 2019) compiled from GenBank.

2.3. Recovery of Complete Genomes of Genomoviruses and Unclassified CRESS DNA Viruses

Abutting primers were designed (Supplementary Table S1) based on the de novo assembled contigs for genomoviruses and unclassified CRESS DNA. These were used for PCR amplification of full genomes. For each PCR amplification, 1 µL of RCA product from each pooled sample was used as a template with primer pairs and Kapa HiFi Hotstart Ready Mix (Kapa Biosystems, Wilmington, MA, USA) using the following thermal cycling conditions: initial denaturation at 95 °C for 3 min, followed by 25 cycles at 98 °C for 20 s, 60 °C for 15 s and 72 °C for 3 min, final elongation at 72 °C for 3 min and a final renaturation at 4 °C for 10 min. The amplicons were resolved on a 0.7% agarose gel, amplicons of ~1.5–3 kb were excised, and gel purified, ligated in pJET1.2 vector (ThermoFisher Scientific, Waltham, MA, USA) and transformed into Escherichia coli XL1-Blue competent cells. The recombinant plasmids were Sanger sequenced at Macrogen Inc. (Seoul, Korea) by primer walking and these sequence contigs were assembled using Geneious 11.1 [37].

2.4. Viral Sequence Analysis

All the complete microvirus genomes from the de novo assemblies were checked by mapping processed raw reads to the assembled genomes using BBMap v 38.32 [38]. All the open reading frames for genomoviruses and the unclassified CRESS DNA viruses were determined using ORF finder (https://www.ncbi.nlm.nih.gov/orffinder/) coupled with manual determination of splice sites for the Reps. For the microviruses, the ORFs were identified using Glimmer [39]. All genome and encoded protein sequence pairwise identities were determined using SDT v1.2 [40].
A dataset of Reps assembled in Fontenele et al. [41] was used together with the Rep protein sequences identified in this study to construct a sequence similarity network (SSN) using EFI-EST [42,43] with a similarity score of 60 that allow for clear viral family-level clustering. The SSN was visualized using the organic layout in Cytoscape V3.7.1 [44].

2.5. Phylogenetic Analysis

2.5.1. Genomoviruses

The Rep protein sequences of the genomoviruses were aligned using MUSCLE [45] and the resulting alignment was used to infer a Maximum-Likelihood phylogenetic tree using PhyML 3.0 [46] with rtREV+G+I+F substitution model (inferred as best fit model using ProtTest [47]) with approximate likelihood ratio test (aLRT) for branch support. The genomovirus Rep amino acid sequence maximum likelihood phylogenetic tree which was rooted with geminivirus Rep sequences. The phylogenetic tree was visualized using iTOL [48].

2.5.2. Unclassified CRESS DNA Viruses

For the unclassified CRESS DNA viruses, Reps encoded by viruses identified in this study falling within clusters from the SSN analysis of ≥5 sequences were aligned with sequences that were part of that cluster using MUSCLE [45]. and Maximum-Likelihood phylogenetic trees were inferred from these alignments using PhyML 3.0 [46] with rtREV+G substitution model for the three phylogenetic trees based on results from ProtTest [47] with approximate likelihood ratio test (aLRT) for branch support. The phylogenetic trees were midpoint rooted and branches with <0.8 aLRT support were collapsed using TreeGraph2 [49]. All the phylogenetic trees were visualized using iTOL [48].

2.5.3. Microviruses

The MCPs of 2590 microviruses available in GenBank, 88 from metagenomics studies described in Roux et al. [50] and Krupovic and Forterre [51], and 119 (Supplementary Data 1) from this study were aligned using PROMAL3D [52,53]. The resulting alignment was used to infer an approximately-Maximum-Likelihood phylogenetic tree using FastTree 2 [54]. The resulting tree was rooted with MCPs of viruses in the family Bullavirinae, and visualized and annotated using iTOL [48].

2.6. Recombination Analysis of Genomoviruses

The full genome alignments, aligned using MAFFT [55], of gemycircularviruses (n = 208), gemykibiviruses (n = 123) and gemykoloviruses (n = 51) were used to identify evidence of recombination using RDP 4 [56]. The recombination analysis was run with default settings using the detection methods RDP [57], GENECONV [58], BOOTSCAN [59], MAXCHI [60], CHIMERA [61], SISCAN [62] and 3SEQ [63]. Recombination events that were detected by three or more methods with p-values < 0.05 coupled with phylogenetic support were considered credible.

3. Results and Discussion

3.1. Identification of ssDNA Virus Genomes in Sonoran Desert Tortoise Fecal Samples

Using a viral metagenomic approach, we identified 156 unique complete genomes of ssDNA viruses from 40 fecal samples of Sonoran Desert tortoises. Of these, 27 are unique genomoviruses, 119 are unique microviruses and 10 are unique unclassified CRESS DNA viruses. In addition, we identified four unique Rep-encoding and one unique CP-encoding circular molecules.
A hallmark of all Rep proteins of CRESS DNA viruses are the conserved rolling-circle replication (RCR) endonuclease and superfamily 3 (SF3) helicase domains [64,65]. In 39 of the 41 Rep-encoding viruses/molecules from this study, we were able to identify the three RCR motifs (Motif I, II and II) and the three SF3 helicase motifs (Walker A, Walker B, and Motif C) (Table 1). For the unclassified CRESS DNA viruses M858258, we were unable to identify the entire RCR domain and for the Rep-encoding circular molecule MK858264 we were unable to identify Motif I of the RCR domain.

3.2. Genomoviruses

Genomoviridae is a recently established family of diverse circular ssDNA viruses [66]. Genomoviruses have an ambisense genome organization and genomes that are ~ 1.9–2.3 kb encoding a CP on the virion sense and Rep on the complementary sense. The family Genomoviridae is divided into nine genera (Gemycircularvirus, Gemyduguivirus, Gemygorvirus, Gemykibivirus, Gemykolovirus, Gemykrogvirus, Gemykroznavirus, Gemytondvirus and Gemyvongvirus) [25]. In general, the genomoviruses are classified at a species level based on their genome-wide pairwise identity with a species cutoff threshold of 78%. Even through genomoviruses have been identified from various sources (animal fluid, tissue and fecal samples, wastewater, river sediments and plant material), Sclerotinia sclerotiorum hypovirulence-associated DNA virus 1 (SsHADV-1) is the only genomovirus with a known host, the fungi Sclerotinia sclerotiorum in which it induces hypovirulence [67]. Thus, it is highly likely that genomoviruses viruses infect fungi.
The tortoise-associated genomoviruses (n = 27) identified in this study can be assigned to three of the nine genera, i.e., Gemycircularvirus (n = 10), Gemykibivirus (n = 15) and Gemykolovirus (n = 2) based on their Rep amino acid sequence (Figure 1). The 10 gemycircularviruses identified in tortoise feces share 66%–98% genome-wide nucleotide pairwise identity between themselves and 65%–83% with other gemycircularvirus sequences in GenBank. These tortoise genomoviruses can be further classified as belonging to six new species, all of which would need to be established based on the criteria outlined for the classification of genomoviruses (i.e., 78% pairwise identity threshold) [25]. Even though MK570223 shares ~82.5% nucleotide identity with a gemycircularvirus identified from Varroa mite samples from New Zealand [68], this virus has not been yet classified and together they would represent a new species. The 15 gemykibivirus identified in tortoise feces share >94% nucleotide pairwise identity amongst themselves and >90% with gemykibiviruses identified from house finch feces and nests from Arizona [69], as a collective these represent new species. The two gemykoloviruses share ~65% nucleotide pairwise identity with each other and 66%–70% with other gemykolovirus sequences in GenBank, and they represent two new species. A summary of the gemycircularvirus (n = 10), gemykibivirus (n = 15) and gemykolovirus (n = 2) virus Rep and CP amino acid sequence pairwise identities between themselves and those encoded by genomovirus sequences in GenBank are provided in Table 2.
We identified nine events of recombination in the genomoviruses from this study, four events in six genomes of gemycircularviruses and five events in 15 genomes of gemykibiviruses (Figure 2 and Table 3). Five of the nine recombinant regions span most of the cp gene (766–1121 nt) and one (917 nt) the rep gene. Three small recombinant regions were identified in the rep genes spanning 52–315 nt (Figure 2 and Table 3). In the case of gemykibivirus genome (MK570211), ~76% of the genome is a recombinant. We found no evidence of recombination in the genomes of the two gemykoloviruses.

3.3. Unclassified Eukaryotic CRESS DNA Viruses and Circular DNA Molecules

Over the last decade, there has been a significant number of novel CRESS DNA viruses that have been discovered in various environments. This has primarily been facilitated by viral metagenomic studies using high throughput sequencing approaches. Most of these novel CRESS DNA viruses cannot be classified into established viral families and thus are referred to as unclassified CRESS DNA viruses. Here, we identified 10 CRESS DNA viruses ranging in size from 1547 to 2300 nt (Figure 3). Given the large number of unclassified CRESS DNA viruses (>2000), we used an SSN based approach to cluster the Rep sequences of those that are classified with those from this study (Figure 3). With a SSN threshold of 60, we are able to generate family-level clusters which support currently classified viruses, i.e., Bacilladnaviridae, Circoviridae, Geminiviridae, Genomoviridae, Nanoviridae and Smacoviridae, as well as Alphasatellitidae and a recently proposed family-level group of viruses called redondoviruses (Figure 3). The only Rep amino acid sequences of viruses identified in this study that cluster with any of these are these are those of genomoviruses. All other Reps of CRESS DNA viruses and Rep-encoding circular molecules cluster form four clusters and three are singletons.
Six of the unclassified CRESS DNA viruses (MK858252-MK858257) have a similar genome organization, i.e., their CP is encoded on the virion sense and their Rep (that has two predicted introns) on the complementary sense. These six genomes share >62% genome-wide nucleotide identity and all have a ‘TAAGATTAC’ nonanucleotide motif. Their Reps share >56% amino acid identity, whereas their CPs share >54% amino acid identity. These Rep amino acid sequences cluster in the SSN and phylogenetically with Reps of viruses from a termite mound (sampled in Kenya) [70] and capybara feces (sampled in Brazil) [41] (Figure 3). The Reps amino acid sequences of the circular molecules MK858259 and MK858260 are most closely related to that of MF118167 from a human fecal sample [71] sharing ~49% amino acid identity whereas the Rep of MF373642 shares 42% amino acid identify with that of KY487833 from wastewater [72] and cluster with other wastewater derived sequences (Figure 3; Table 4). Sequences of MF373642, MK858258, MK858262 and MK858265 encode Reps that share 30%–46% amino acid identity with other unclassified CRESS DNA virus Reps.
In addition to the unclassified CRESS DNA viruses, five circular molecules ranging in size from 1684 to 3209 nt were also identified. Four of these five encode at least a Rep and one encodes only a CP (which shares ~52% amino acid identity with that of MK858258). Of the four that encode a Rep, one circular molecule (MK858264) also encodes a site-specific integrase sharing 62% identity (99% coverage; E-Value 9×10-175) with one from an Oscillibacter sp. (CDB27191).

3.4. Microviruses

Microviruses are bacteriophages and the family Microviridae is divided into two subfamilies, Bullavirinae and Gokushovirinae [26]. Microviruses that have been well studied are known to infect enterobacteria. Thus, the detection of these viruses in the tortoise fecal samples is highly likely to be associated with their gut microbial flora. Over the last couple of years, there have been a large number of microviruses that have been identified in various sample types from vertebrates, invertebrates, and environmental samples. Despite there being >2500 genomes in GenBank which are very diverse, microviruses are poorly classified at a taxonomic level.
In this study, we identified 119 genomes (size range 4217–6549 nt) of microviruses from the 40 samples, of which 111 share less than 98% genome wide nucleotide identity (Figure 4). All of these 119 microviruses encode at least a MCP, DNA pilot protein and a replication initiator protein with the exception of MK765635, which appears to be missing a DNA pilot protein. The genomic organization in terms of gene sequence order varies across all the microviruses identified in this study (Figure 4). The ORF coding for the replication initiator protein of MK765582 and MK765642 appears to have an intron. Although introns are rare in bacteria, introns have been identified in bacteriophage ORFs [73].
Of all the proteins encoded by microviruses, the MCP is the most conserved and thus is generally used to determine relationships between these viruses [74]. Of the >2500 genomes of microvirus available in public databases, only a handful have been classified into three genera, Bdellomicrovirus (two species), Chlamydiamicrovirus (four species) and Spiromicrovirus (one species) for Gokushovirinae. Similarly, there are only three genera for Bullavirinae (Alphatrevirus, Gequatrovirus and Sinsheimervirus). The MCPs of the 119 microviruses identified in this study share 25%–100% amino acid identity amongst themselves and 25%–76.7% with those of previously identified microviruses (Figure 4).
Beyond the official recognition of two sub-families for Microviridae, a handful of clades have be identified that may potentially be considered as sub-families [50,51]. Here, we refer to these as Alphavirinae-, Parabacteroides- and Pichovirinae-clades (Figure 5). Further, there are nine singletons and 13 clades that we have identified that are also unique (Supplementary Data 1).
Of the 119 microviruses identified here, six belong to the sub-family Gokushovirinae, two to Parabacteroides clade, seven to proposed Pichovirinae clade and 14 to Alphavirinae clade. Vast majority of the microviruses from this study fall within clade 5 (n = 51) and the remaining in clades 8 (n = 4), 9 (n = 11), 11 (n = 1), 14 (n = 2) 17 (n = 21) (Supplementary Data 1). The tortoise feces derived microviruses represent 52.4% (11/21) in clade 9, ~46.4% (51/110) in clade 5, ~40.4% (21/52) in clade 17, 4.7% in clade 14 and 3.0% (4/133) in clade 8, whereas clade 11 is a singleton. Within clade 5 of the microviruses, the viruses are derived from various animals samples including capybara (n = 11), cow (n = 6), chimpanzee (n = 3), dog (n = 1), fish (n = 2), macaque (n = 3), mink (n = 4), moose (n = 2), ping (n = 19), rat (n = 1), tortoise (n = 51), yak (n = 6) (Supplementary Data 1). Whereas, within clade 17, the microviruses are derived from the bacterial species Citromicrobium sp. (n = 1), Ruegeria pomeroyi (n = 2) and various animal samples (ciona, n = 6; cow, n = 2; fish, n = 17; mouse, n = 1; nematode, n = 1; tortoise, n = 21, unknown animal, n = 1; Supplementary Data 1).
Based on the identification of the 119 microviruses and the large number of unclassified ones, it is evident that these viruses, like CRESS DNA viruses, are highly diverse, found in various sample types and appear to have variants in their gene order within the genomes. It is also evident that the taxonomy of these viruses needs to be more thoroughly assessed.

4. Conclusions

In this study, using 40 fecal samples from Sonoran Desert tortoises collected in Arizona (USA), we identified 156 novel viruses, including 27 genomoviruses, 10 unclassified CRESS viruses, and 119 microviruses. The genomoviruses and microviruses likely infect organisms in the diet or gut of the tortoise, whereas diverse unclassified viruses may infect the tortoise themselves or organisms associated with them. Without a doubt, further studies would need to be carried out to determine the infectivity of these unclassified viruses in tortoise tissue or blood samples. Nonetheless, here, we highlight the high diversity of ssDNA viruses in the Sonoran Desert tortoise fecal matter.
While some pathogens, such as the bacterial Mycoplasma infection causal to upper respiratory tract disease in tortoises, have been well studied, other than this study, there have been no others that have attempted to evaluate their viral diversity. Further studies are required to elucidate whether these novel viruses are associated with the tortoises themselves or associated with diet and the desert environment.

Supplementary Materials

The following are available online at https://www.mdpi.com/1999-4915/12/2/143/s1, Supplementary Figure 1: Approximately Maximum-Likelihood cladogram of the MCP sequences (n = 2797). Branches are color coded based on sub-families (Bullavirinae and Gokushovirinae) and Alphaviriniae-, Parabacterioides- and Pichovirinae-clades. In addition to these, 22 unique clades are marked with number. Branches in grey represent an additional nine singletons and 13 clades. Branches in red denote sequences identified in this study. The outer circle represents taxa with some level of classification assigned prior to this study. Branch support with >0.8 aLRT is shown. Taxa names and assignment are provided in Supplementary Data 1. Supplementary Figure 2: Approximately Maximum-Likelihood phylogram of the MCP sequences (n = 2797). Branches are color coded based on sub-families (Bullavirinae and Gokushovirinae) and Alphaviriniae-, Parabacterioides- and Pichovirinae-clades. Branches in grey represent an additional nine singletons and 13 clades. Branches in red denote sequences identified in this study. The outer circle represents taxa with some level of classification assigned prior to this study. Branch support with >0.8 aLRT is shown. Taxa names and assignment are provided in Supplementary Data 1. Supplementary Table 1: Details of abutting primers used to recover the full genomes of genomoviruses and unclassified CRESS DNA viruses for this study. Supplementary Data 1: Summary of microviruses whose MCPs were used to generate the phylogeny presented in Figure 5 and Supplementary Figures S1 and S2. Detailed of accession number, sub-family/putative sub-families (based on phylogeny presented in this study), genome length, country of identification, and isolation source are provided.

Author Contributions

Conceptualization, T.H.W., M.A.W., K.K., G.A.D., and A.V.; methodology, J.P.O., M.M., R.S.F., K.S., S.K., D.J.L., T.H.W., M.A.W., K.K., G.A.D., and A.V.; validation, J.P.O., M.M., R.S.F., K.S., S.K., G.A.D., and A.V.; formal analysis, J.P.O., M.M., R.S.F., G.A.D., and A.V.; investigation, J.P.O., M.M., R.S.F., K.S., S.K., D.J.L., T.H.W., M.A.W., K.K., G.A.D., and A.V.; resources, T.H.W., M.A.W., K.K., G.A.D., and A.V.; data curation, R.S.F. and A.V.; writing—original draft preparation, J.P.O., M.M., R.S.F., K.S., S.K., G.A.D., and A.V.; writing—review and editing, J.P.O., M.M., R.S.F., K.S., S.K., D.J.L., T.H.W., M.A.W., K.K., G.A.D., and A.V.; visualization, R.S.F., S.K., and A.V.; supervision, K.K., G.A.D., and A.V.; project administration, G.A.D. and A.V.; funding acquisition, A.V. All authors have read and agreed to the published version of the manuscript.

Funding

The molecular work described in this manuscript was funded by a startup grant from Arizona State University awarded to A.V.

Acknowledgments

The authors would like to thank H. Hoffman, A. Owens, T. Jones, K. Sullivan and A. Scuderi for assistance with field work.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Medica, P.A.; Nussear, K.E.; Esque, T.C.; Saethre, M.B. Long-Term Growth of Desert Tortoises (Gopherus agassizii) in a Southern Nevada Population. J. Herpetol. 2012, 46, 213–220. [Google Scholar] [CrossRef]
  2. Curtin, A.J.; Curtin, A.J.; Zug, G.R.; Spotila, J.R. Longevity and growth strategies of the desert tortoise (Gopherus agassizii) in two American deserts. J. Arid Environ. 2009, 73, 463–471. [Google Scholar] [CrossRef]
  3. Bailey, S.J.; Schwalbe, C.R.; Lowe, C.H. Hibernaculum Use by a Population of Desert Tortoises (Gopherus-Agassizii) in the Sonoran Desert. J. Herpetol. 1995, 29, 361–369. [Google Scholar] [CrossRef]
  4. Nagy, K.A.; Medica, P.A. Physiological Ecology of Desert Tortoises in Southern Nevada. Herpetologica 1986, 42, 73–92. [Google Scholar]
  5. Grover, M.C.; DeFalco, L.A. Desert Tortoise (Gopherus Agassizii): Status-of-Knowledge Outline with References; Gen. Tech. Rep. INT-GTR-316; US Department of Agriculture, Intermountain Research Station: Ogden, UT, USA, 1995; Volume 316, p. 134.
  6. Drake, K.K.; Bowen, L.; Nussear, K.E.; Esque, T.C.; Berger, A.J.; Custer, N.A.; Waters, S.C.; Johnson, J.D.; Miles, A.K.; Lewison, R.L. Negative impacts of invasive plants on conservation of sensitive desert wildlife. Ecosphere 2016, 7, e01531. [Google Scholar] [CrossRef]
  7. Murphy, R.W.; Berry, K.H.; Edwards, T.; Leviton, A.E.; Lathrop, A.; Riedle, J.D. The dazed and confused identity of Agassiz’s land tortoise, Gopherus agassizii (Testudines, Testudinidae) with the description of a new species, and its consequences for conservation. ZooKeys 2011, 39. [Google Scholar] [CrossRef] [PubMed]
  8. Edwards, T.; Karl, A.E.; Vaughn, M.; Rosen, P.C.; Torres, C.M.; Murphy, R.W. The desert tortoise trichotomy: Mexico hosts a third, new sister-species of tortoise in the Gopherus morafkai–G. agassizii group. ZooKeys 2016, 131. [Google Scholar] [CrossRef]
  9. Dolby, G.A.; Dorsey, R.J.; Graham, M.R. A legacy of geo-climatic complexity and genetic divergence along the lower Colorado River: Insights from the geological record and 33 desert-adapted animals. J. Biogeogr. 2019, 46, 2479–2505. [Google Scholar] [CrossRef]
  10. Brown, M.B.; Schumacher, I.M.; Klein, P.A.; Harris, K.; Correll, T.; Jacobson, E.R. Mycoplasma agassizii causes upper respiratory tract disease in the desert tortoise. Infect. Immun. 1994, 62, 4580–4586. [Google Scholar] [CrossRef] [PubMed]
  11. Service, U.F.a.W. Species Status Assessment for the Sonoran Desert Tortoise; Version 1.0, September 2015; Southwest Region US Fish and Wildlife Service: Albuquerque, NM, USA, 2015.
  12. Dickinson, V.M.; Schumacher, I.M.; Jarchow, J.L.; Duck, T.; Schwalbe, C.R. Mycoplasmosis in free-ranging desert tortoises in Utah and Arizona. J. Wildl. Dis. 2005, 41, 839–842. [Google Scholar] [CrossRef] [PubMed]
  13. Jones, C.A. Mycoplasma agassizii in the Sonoran population of the desert tortoise in Arizona. Master’s Thesis, The University of Arizona, Tucson, AZ, USA, 2008. [Google Scholar]
  14. Ng, T.F.; Wellehan, J.F.; Coleman, J.K.; Kondov, N.O.; Deng, X.; Waltzek, T.B.; Reuter, G.; Knowles, N.J.; Delwart, E. A tortoise-infecting picornavirus expands the host range of the family Picornaviridae. Arch. Virol. 2015, 160, 1319–1323. [Google Scholar] [CrossRef] [PubMed]
  15. Farkas, S.L.; Ihasz, K.; Feher, E.; Bartha, D.; Jakab, F.; Gal, J.; Banyai, K.; Marschang, R.E. Sequencing and phylogenetic analysis identifies candidate members of a new picornavirus genus in terrestrial tortoise species. Arch. Virol. 2015, 160, 811–816. [Google Scholar] [CrossRef] [PubMed]
  16. Stohr, A.C.; Lopez-Bueno, A.; Blahak, S.; Caeiro, M.F.; Rosa, G.M.; Alves de Matos, A.P.; Martel, A.; Alejo, A.; Marschang, R.E. Phylogeny and differentiation of reptilian and amphibian ranaviruses detected in Europe. PLoS ONE 2015, 10, e0118633. [Google Scholar] [CrossRef] [PubMed]
  17. Gandar, F.; Wilkie, G.S.; Gatherer, D.; Kerr, K.; Marlier, D.; Diez, M.; Marschang, R.E.; Mast, J.; Dewals, B.G.; Davison, A.J.; et al. The Genome of a Tortoise Herpesvirus (Testudinid Herpesvirus 3) Has a Novel Structure and Contains a Large Region That Is Not Required for Replication In Vitro or Virulence In Vivo. J. Virol. 2015, 89, 11438–11456. [Google Scholar] [CrossRef]
  18. Schumacher, V.L.; Innis, C.J.; Garner, M.M.; Risatti, G.R.; Nordhausen, R.W.; Gilbert-Marcheterre, K.; Wellehan, J.F., Jr.; Childress, A.L.; Frasca, S., Jr. Sulawesi tortoise adenovirus-1 in two impressed tortoises (Manouria impressa) and a Burmese star tortoise (Geochelone platynota). J. Zoo Wildl. Med. 2012, 43, 501–510. [Google Scholar] [CrossRef]
  19. Rivera, S.; Wellehan, J.F., Jr.; McManamon, R.; Innis, C.J.; Garner, M.M.; Raphael, B.L.; Gregory, C.R.; Latimer, K.S.; Rodriguez, C.E.; Diaz-Figueroa, O.; et al. Systemic adenovirus infection in Sulawesi tortoises (Indotestudo forsteni) caused by a novel siadenovirus. J. Vet. Diagn. Invest. 2009, 21, 415–426. [Google Scholar] [CrossRef]
  20. Garcia-Morante, B.; Penzes, J.J.; Costa, T.; Martorell, J.; Martinez, J. Hyperplastic stomatitis and esophagitis in a tortoise (Testudo graeca) associated with an adenovirus infection. J. Vet. Diagn. Invest. 2016, 28, 579–583. [Google Scholar] [CrossRef]
  21. Doszpoly, A.; Wellehan, J.F., Jr.; Childress, A.L.; Tarjan, Z.L.; Kovacs, E.R.; Harrach, B.; Benko, M. Partial characterization of a new adenovirus lineage discovered in testudinoid turtles. Infect. Genet. Evol. 2013, 17, 106–112. [Google Scholar] [CrossRef]
  22. Papp, T.; Seybold, J.; Marschang, R.E. Paramyxovirus Infection in a Leopard Tortoise Geochelone pardalis babcocki) with Respiratory Disease. J. Herpetol. Med. Surg. 2010, 20, 64–68. [Google Scholar] [CrossRef]
  23. Marschang, R.E.; Papp, T.; Frost, J.W. Comparison of paramyxovirus isolates from snakes, lizards and a tortoise. Virus Res. 2009, 144, 272–279. [Google Scholar] [CrossRef]
  24. Svet-Moldavsky, G.J.; Trubcheninova, L.; Ravkina, L.I. Sarcomas in reptiles induced with Rous virus. Folia Biol. (Praha) 1967, 13, 84. [Google Scholar] [PubMed]
  25. Varsani, A.; Krupovic, M. Sequence-based taxonomic framework for the classification of uncultured single-stranded DNA viruses of the family Genomoviridae. Virus Evol. 2017, 3, vew037. [Google Scholar] [CrossRef] [PubMed]
  26. Cherwa, J.E.J.; Fane, B.A. Microviridae. In Virus Taxonomy; King, A.M.Q., Adams, M.J., Carstens, E.B., Lefkowitz, E.J., Eds.; Elsevier: San Diego, CA, USA, 2012; pp. 385–393. [Google Scholar] [CrossRef]
  27. Kazlauskas, D.; Dayaram, A.; Kraberger, S.; Goldstien, S.; Varsani, A.; Krupovic, M. Evolutionary history of ssDNA bacilladnaviruses features horizontal acquisition of the capsid gene from ssRNA nodaviruses. Virology 2017, 504, 114–121. [Google Scholar] [CrossRef] [PubMed]
  28. Breitbart, M.; Delwart, E.; Rosario, K.; Segales, J.; Varsani, A.; Ictv Report, C. ICTV Virus Taxonomy Profile: Circoviridae. J. Gen. Virol. 2017, 98, 1997–1998. [Google Scholar] [CrossRef]
  29. Zerbini, F.M.; Briddon, R.W.; Idris, A.; Martin, D.P.; Moriones, E.; Navas-Castillo, J.; Rivera-Bustamante, R.; Roumagnac, P.; Varsani, A.; Ictv Report, C. ICTV Virus Taxonomy Profile: Geminiviridae. J. Gen. Virol. 2017, 98, 131–133. [Google Scholar] [CrossRef]
  30. Vetten, H.J.; Dale, J.L.; Grigoras, I.; Gronenborn, B.; Harding, R.; Randles, J.W.; Sano, Y.; Thomas, J.E.; Timchenko, T.; Yeh, H.H. Nanoviridae. In Virus Taxonomy; King, A.M.Q., Adams, M.J., Carstens, E.B., Lefkowitz, E.J., Eds.; Elsevier: San Diego, CA, USA, 2012; pp. 395–404. [Google Scholar] [CrossRef]
  31. Varsani, A.; Krupovic, M. Smacoviridae: A new family of animal-associated single-stranded DNA viruses. Arch. Virol. 2018, 163, 2005–2015. [Google Scholar] [CrossRef]
  32. Varsani, A.; Krupovic, M. Correction to: Smacoviridae: A new family of animal-associated single-stranded DNA viruses. Arch. Virol. 2018, 163, 3213–3214. [Google Scholar] [CrossRef]
  33. Steel, O.; Kraberger, S.; Sikorski, A.; Young, L.M.; Catchpole, R.J.; Stevens, A.J.; Ladley, J.J.; Coray, D.S.; Stainton, D.; Dayaram, A.; et al. Circular replication-associated protein encoding DNA viruses identified in the faecal matter of various animals in New Zealand. Infect. Genet. Evol. 2016, 43, 151–164. [Google Scholar] [CrossRef]
  34. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef]
  35. Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; Pham, S.; Prjibelski, A.D.; et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012, 19, 455–477. [Google Scholar] [CrossRef]
  36. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
  37. Kearse, M.; Moir, R.; Wilson, A.; Stones-Havas, S.; Cheung, M.; Sturrock, S.; Buxton, S.; Cooper, A.; Markowitz, S.; Duran, C.; et al. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 2012, 28, 1647–1649. [Google Scholar] [CrossRef] [PubMed]
  38. Bushnell, B. BBMap: A Fast, Accurate, Splice-Aware Aligner; Lawrence Berkeley National Lab.(LBNL): Berkeley, CA, USA, 2014. [Google Scholar]
  39. Delcher, A.L.; Harmon, D.; Kasif, S.; White, O.; Salzberg, S.L. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999, 27, 4636–4641. [Google Scholar] [CrossRef] [PubMed]
  40. Muhire, B.M.; Varsani, A.; Martin, D.P. SDT: A virus classification tool based on pairwise sequence alignment and identity calculation. PLoS ONE 2014, 9, e108277. [Google Scholar] [CrossRef] [PubMed]
  41. Fontenele, R.S.; Lacorte, C.; Lamas, N.S.; Schmidlin, K.; Varsani, A.; Ribeiro, S.G. Single Stranded DNA Viruses Associated with Capybara Faeces Sampled in Brazil. Viruses 2019, 11, 710. [Google Scholar] [CrossRef]
  42. Gerlt, J.A.; Bouvier, J.T.; Davidson, D.B.; Imker, H.J.; Sadkhin, B.; Slater, D.R.; Whalen, K.L. Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks. Biochim. Biophys. Acta 2015, 1854, 1019–1037. [Google Scholar] [CrossRef]
  43. Zallot, R.; Oberg, N.O.; Gerlt, J.A. ‘Democratized’ genomic enzymology web tools for functional assignment. Curr. Opin. Chem. Biol. 2018, 47, 77–85. [Google Scholar] [CrossRef]
  44. Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13, 2498–2504. [Google Scholar] [CrossRef]
  45. Edgar, R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar] [CrossRef]
  46. Guindon, S.; Dufayard, J.F.; Lefort, V.; Anisimova, M.; Hordijk, W.; Gascuel, O. New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. Syst. Biol. 2010, 59, 307–321. [Google Scholar] [CrossRef]
  47. Darriba, D.; Taboada, G.L.; Doallo, R.; Posada, D. ProtTest 3: Fast selection of best-fit models of protein evolution. Bioinformatics 2011, 27, 1164–1165. [Google Scholar] [CrossRef] [PubMed]
  48. Letunic, I.; Bork, P. Interactive Tree Of Life (iTOL) v4: Recent updates and new developments. Nucleic Acids Res. 2019, 47, W256–W259. [Google Scholar] [CrossRef] [PubMed]
  49. Stover, B.C.; Muller, K.F. TreeGraph 2: Combining and visualizing evidence from different phylogenetic analyses. BMC Bioinform. 2010, 11, 7. [Google Scholar] [CrossRef] [PubMed]
  50. Roux, S.; Krupovic, M.; Poulet, A.; Debroas, D.; Enault, F. Evolution and diversity of the Microviridae viral family through a collection of 81 new complete genomes assembled from virome reads. PLoS ONE 2012, 7, e40418. [Google Scholar] [CrossRef] [PubMed]
  51. Krupovic, M.; Forterre, P. Microviridae goes temperate: Microvirus-related proviruses reside in the genomes of Bacteroidetes. PLoS ONE 2011, 6, e19893. [Google Scholar] [CrossRef]
  52. Pei, J.; Grishin, N.V. PROMALS3D: Multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information. Methods Mol. Biol. 2014, 1079, 263–271. [Google Scholar] [CrossRef]
  53. Pei, J.; Kim, B.H.; Grishin, N.V. PROMALS3D: A tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 2008, 36, 2295–2300. [Google Scholar] [CrossRef]
  54. Price, M.N.; Dehal, P.S.; Arkin, A.P. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS ONE 2010, 5, e9490. [Google Scholar] [CrossRef]
  55. Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef]
  56. Martin, D.P.; Murrell, B.; Golden, M.; Khoosal, A.; Muhire, B. RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evol. 2015, 1, vev003. [Google Scholar] [CrossRef]
  57. Martin, D.; Rybicki, E. RDP: Detection of recombination amongst aligned sequences. Bioinformatics 2000, 16, 562–563. [Google Scholar] [CrossRef]
  58. Padidam, M.; Sawyer, S.; Fauquet, C.M. Possible emergence of new geminiviruses by frequent recombination. Virology 1999, 265, 218–225. [Google Scholar] [CrossRef] [PubMed]
  59. Martin, D.P.; Posada, D.; Crandall, K.A.; Williamson, C. A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Res. Hum. Retroviruses 2005, 21, 98–102. [Google Scholar] [CrossRef] [PubMed]
  60. Smith, J.M. Analyzing the mosaic structure of genes. J. Mol. Evol. 1992, 34, 126–129. [Google Scholar] [CrossRef] [PubMed]
  61. Posada, D.; Crandall, K.A. Evaluation of methods for detecting recombination from DNA sequences: Computer simulations. Proc. Natl. Acad. Sci. USA 2001, 98, 13757–13762. [Google Scholar] [CrossRef] [PubMed]
  62. Gibbs, M.J.; Armstrong, J.S.; Gibbs, A.J. Sister-scanning: A Monte Carlo procedure for assessing signals in recombinant sequences. Bioinformatics 2000, 16, 573–582. [Google Scholar] [CrossRef]
  63. Boni, M.F.; Posada, D.; Feldman, M.W. An exact nonparametric method for inferring mosaic structure in sequence triplets. Genetics 2007, 176, 1035–1047. [Google Scholar] [CrossRef]
  64. Kazlauskas, D.; Varsani, A.; Koonin, E.V.; Krupovic, M. Multiple origins of prokaryotic and eukaryotic single-stranded DNA viruses from bacterial and archaeal plasmids. Nat. Commun. 2019, 10, 3425. [Google Scholar] [CrossRef]
  65. Rosario, K.; Duffy, S.; Breitbart, M. A field guide to eukaryotic circular single-stranded DNA viruses: Insights gained from metagenomics. Arch. Virol. 2012, 157, 1851–1871. [Google Scholar] [CrossRef]
  66. Krupovic, M.; Ghabrial, S.A.; Jiang, D.; Varsani, A. Genomoviridae: A new family of widespread single-stranded DNA viruses. Arch. Virol. 2016, 161, 2633–2643. [Google Scholar] [CrossRef]
  67. Yu, X.; Li, B.; Fu, Y.; Jiang, D.; Ghabrial, S.A.; Li, G.; Peng, Y.; Xie, J.; Cheng, J.; Huang, J.; et al. A geminivirus-related DNA mycovirus that confers hypovirulence to a plant pathogenic fungus. Proc. Natl. Acad. Sci. USA 2010, 107, 8387–8392. [Google Scholar] [CrossRef] [PubMed]
  68. Kraberger, S.; Visnovsky, G.A.; van Toor, R.F.; Male, M.F.; Waits, K.; Fontenele, R.S.; Varsani, A. Genome Sequences of Two Single-Stranded DNA Viruses Identified in Varroa destructor. Genome Announc. 2018, 6, e00107-18. [Google Scholar] [CrossRef] [PubMed]
  69. Schmidlin, K.; Sepp, T.; Khalifeh, A.; Smith, K.; Fontenele, R.S.; McGraw, K.J.; Varsani, A. Diverse genomoviruses representing eight new and one known species identified in feces and nests of house finches (Haemorhous mexicanus). Arch. Virol. 2019, 164, 2345–2350. [Google Scholar] [CrossRef] [PubMed]
  70. Kerr, M.; Rosario, K.; Baker, C.C.M.; Breitbart, M. Discovery of Four Novel Circular Single-Stranded DNA Viruses in Fungus-Farming Termites. Genome Announc. 2018, 6, e00318-18. [Google Scholar] [CrossRef] [PubMed]
  71. Zhao, G.; Vatanen, T.; Droit, L.; Park, A.; Kostic, A.D.; Poon, T.W.; Vlamakis, H.; Siljander, H.; Harkonen, T.; Hamalainen, A.M.; et al. Intestinal virome changes precede autoimmunity in type I diabetes-susceptible children. Proc. Natl. Acad. Sci. USA 2017, 114, E6166–E6175. [Google Scholar] [CrossRef] [PubMed]
  72. Pearson, V.M.; Caudle, S.B.; Rokyta, D.R. Viral recombination blurs taxonomic lines: Examination of single-stranded DNA viruses in a wastewater treatment plant. PeerJ 2016, 4, e2585. [Google Scholar] [CrossRef] [PubMed]
  73. Belfort, M. Bacteriophage introns: Parasites within parasites? Trends Genet. 1989, 5, 209–213. [Google Scholar] [CrossRef]
  74. Creasy, A.; Rosario, K.; Leigh, B.A.; Dishaw, L.J.; Breitbart, M. Unprecedented Diversity of ssDNA Phages from the Family Microviridae Detected within the Gut of a Protochordate Model Organism (Ciona robusta). Viruses 2018, 10, 404. [Google Scholar] [CrossRef]
Figure 1. A. Phylogenetic relationship of the Rep protein sequences of genomoviruses with genus level clustering. B. Maximum likelihood phylogenetic trees of the Reps sequences of genomoviruses identified in this study that belong to the genera Gemykolovirus, Gemykibivirus and Gemycircularvirus. For each tree, a detailed view of the clusters that have the Reps from this study is provided to the right of the genus level trees. Next to each accession number the source of the sequence is provided.
Figure 1. A. Phylogenetic relationship of the Rep protein sequences of genomoviruses with genus level clustering. B. Maximum likelihood phylogenetic trees of the Reps sequences of genomoviruses identified in this study that belong to the genera Gemykolovirus, Gemykibivirus and Gemycircularvirus. For each tree, a detailed view of the clusters that have the Reps from this study is provided to the right of the genus level trees. Next to each accession number the source of the sequence is provided.
Viruses 12 00143 g001
Figure 2. Illustration of the nine events of recombination detected by RDP4 in the genomes of gemycircularviruses (n = 6) and gemykibiviruses (n = 15). The black regions depict the recombinant region, with I–IX denoting the recombination events (see Table 3 for details of recombination). The genome organization of genomoviruses is provided at the top showing the cp and rep genes.
Figure 2. Illustration of the nine events of recombination detected by RDP4 in the genomes of gemycircularviruses (n = 6) and gemykibiviruses (n = 15). The black regions depict the recombinant region, with I–IX denoting the recombination events (see Table 3 for details of recombination). The genome organization of genomoviruses is provided at the top showing the cp and rep genes.
Viruses 12 00143 g002
Figure 3. Sequence similarity networks showing family-level clustering of the Reps of CRESS DNA viruses and Rep-encoding circular molecules. For each cluster that has a Rep from this study (colored in red), a phylogenetic tree (midpoint rooted) has been inferred for all sequences belonging to the cluster.
Figure 3. Sequence similarity networks showing family-level clustering of the Reps of CRESS DNA viruses and Rep-encoding circular molecules. For each cluster that has a Rep from this study (colored in red), a phylogenetic tree (midpoint rooted) has been inferred for all sequences belonging to the cluster.
Viruses 12 00143 g003
Figure 4. A. Linearized genome representation of the genomes of microviruses identified in this study with color coded open reading frames. B. Pairwise identities of the MCP amino acid sequences (showing highest identity) of the microviruses identified in this study against themselves and against those available in GenBank. Gokushovirinae is an officially recognized sub-family of the family Microviridae. Alphavirinae, Pichovirinae and Parabacteroides group are proposed groups within the family Microviridae [50,51]. Clade group is based on the phylogenetic analysis of the MCPs (Figure 5; Supplementary Data 1).
Figure 4. A. Linearized genome representation of the genomes of microviruses identified in this study with color coded open reading frames. B. Pairwise identities of the MCP amino acid sequences (showing highest identity) of the microviruses identified in this study against themselves and against those available in GenBank. Gokushovirinae is an officially recognized sub-family of the family Microviridae. Alphavirinae, Pichovirinae and Parabacteroides group are proposed groups within the family Microviridae [50,51]. Clade group is based on the phylogenetic analysis of the MCPs (Figure 5; Supplementary Data 1).
Viruses 12 00143 g004
Figure 5. Approximately Maximum-Likelihood cladogram of the MCP sequences (n = 2797). Branches are color coded based on sub-families (Bullavirinae and Gokushovirinae) and Alphavirinae-, Parabacteroides- and Pichovirinae-clades. In addition to these, 22 unique clades are marked with numbers. Branches in grey represent an additional nine singletons and 13 clades. Branches in red denote sequences identified in this study. The outer circle represents taxa with some level of classification assigned prior to this study. Branch support with >0.8 aLRT is shown. Detailed cladogram and phylogram are provided in Supplementary Figures S1 and S2 and taxa names and assignment are provided in Supplementary Data 1.
Figure 5. Approximately Maximum-Likelihood cladogram of the MCP sequences (n = 2797). Branches are color coded based on sub-families (Bullavirinae and Gokushovirinae) and Alphavirinae-, Parabacteroides- and Pichovirinae-clades. In addition to these, 22 unique clades are marked with numbers. Branches in grey represent an additional nine singletons and 13 clades. Branches in red denote sequences identified in this study. The outer circle represents taxa with some level of classification assigned prior to this study. Branch support with >0.8 aLRT is shown. Detailed cladogram and phylogram are provided in Supplementary Figures S1 and S2 and taxa names and assignment are provided in Supplementary Data 1.
Viruses 12 00143 g005
Table 1. Summary of the rolling circle replication (RCR) endonuclease motifs (Motif I, II, II) and superfamily 3 (SF3) helicase motifs (Walker A, Walker B, and Motif C) encoded genomoviruses, unclassified CRESS DNA viruses and Rep-encoding circular molecules identified in this study.
Table 1. Summary of the rolling circle replication (RCR) endonuclease motifs (Motif I, II, II) and superfamily 3 (SF3) helicase motifs (Walker A, Walker B, and Motif C) encoded genomoviruses, unclassified CRESS DNA viruses and Rep-encoding circular molecules identified in this study.
Rolling Circle Replication (RCR) EndonucleaseSuperfamily 3 (SF3) Helicase Motifs
Genus/GroupAccession #Motif IMotif IIMotif IIIWalker AWalker BMotif C
GemykolovirusMK570209MLTYSDPHFHCRRWDYVGKGATRLGKTVWARVFDDIWLCN
MK570213FLTYSNPHFHCRRWDYVGKGETRLGKTVWARIFDDIWICN
GemykibivirusMK570202LLTYPQVHLHAKGAAYAIKGGTRLGKTLWARVFDDMYISN
MK570214LLTYPQVHLHAKGAAYAIKGGTRLGKTLWARIFDDMYISN
MK570215LLTYPQVHLHAKGAAYAIKGGTRLGKTLWARIFDDMYISN
MK570205LLTYPQIHLHAKGYAYAIKGPTRLGKTLWARVFDDMYISN
MK570211LLTYPQVHLHAKGYAYAIKGPTRLGKTLWARVFDDMYISN
MK570216LLTYPQVHLHAKGYAYAVKGPTRLGKTLWARVFDDMYISN
MK570207LLTYPQIHLHAKGYAYATKGPTRLGKTLWARVFDDMYISN
MK570208LLTYPQVHLHAKGYAYAIKGPTRLGKTLWARVFDDMYISN
MK570201LLTYPQIHLHAKGYAYAIKGPTRLGKTLWARIFDDMYISN
MK570203LLTYPQVHLHAKGYAYAIKGPTRLGKTLWARIFDDMYISN
MF373640LLTYPQVHLHAKGYAYAIKGPTRLGKTLWARVFDDMYISN
MK570206LLTYPQVHLHAKGYAYAIKGPTRLGKTLWARVFDDMYISN
MF373641LLTYPQVHLHAKGYAYAIKGPTRLGKTLWARVFDDMYISN
MF373638LLTYPQVHLHAKGYAYAIKGPTRLGKTLWARVFDDMYISN
MF373639LLTYPQVHLHAKGYAYAVKGPTRLGKTLWARVFDDMYISN
GemycircularvirusMK570218LITYAQVHLHAKGYDYAIKGDSQLGKTVWARIFDDMWLCN
MK570204LLTYPQIHLHAKGYDYAIKGDSQLGKTVWARVFDDMWLCN
MK570212LLTYPQFHLHAKGYDYAIKGDSQLGKTLWARVFDDMWLCN
MK570210LLTYAQIHLHVTMYDYAIKGESRLGKTVWARVFDDMWLAN
MK570223LFTYAQIHFHVTAYDYACKGPYGCGKTVWARIFDDWWLCN
MK570217LITYSQVHLHCKGWDYACKGASQTGKTLWARVFDDIWLSN
MK570222LITYSQIHLHCKGWDYACKGASQTGKTLWARVFDDIWLSN
MK570219LITYSQIHLHCKGYDYAIKGASQTGKTLWARVFDDIWLSN
MK570220LITYSQIHLHCKGYDYAIKGASQTGKTLWARVFDDIWLSN
MK570221LLTYAQSHLHCAGFDYACKGEPLTGKTDWARIFDDIWCAN
Unclassified CRESS DNA virusesMK858252FLTYPQDHLHADVYNYVIKGPSKTGKTQWARVIDDMILCN
MK858253FLTYPQDHLHADVYNYVIKGPSKTGKTQWARVIDDMILCN
MK858254FLTYPQDHLHADVYNYVTKGPSKTGKTAWARVIDDMILCN
MK858255FLTYPQNHLHVDVYAYITKGASKTGKTQWARVLDDLILCN
MK858256FLTYPRDHLHVHVYRYVRKGPSKTGKTEWARIFDDLILCN
MK858257FLTFARDHRHVGARQYTQKGPSKTGKTHWARVFDDLILCN
MF373642FLTYPQPHLHCAVRRYCSKGNTETGKTTLAKILDDMITSN
MK858265LVTWSQLHYHADALAYVKKGPTGSGKTRCAIIFDDM-
MK858262CKKYRRPHIQGECVTYCKKGPSGVGKTREVEVFDDFRITN
MK858258---GGSNTGKTTYLRWIDEFTLKN
Rep-encoding circular moleculesMK858259CFTWNNPHIQGDNFKYCTKGPAGTGKTTWGRCIEDYVTSN
MK858260CFTWNNPHIQGDNFKYCTKGPAGTGKTTWGRCIEDYVTSN
MK858263LLTFNNYHTHLENRAYVLKGETGTGKTSSVMLFDEFLVSN
MK858264-YHTHLENREYIRKGSTGTGKTSYVMLFDEFIISN
Table 2. Pairwise identity comparisons (showing highest identity) of the genome, and the replication associated protein (Rep) and capsid protein (CP) amino acid sequences of the genomoviruses identified in this study against themselves and those available in GenBank.
Table 2. Pairwise identity comparisons (showing highest identity) of the genome, and the replication associated protein (Rep) and capsid protein (CP) amino acid sequences of the genomoviruses identified in this study against themselves and those available in GenBank.
Genomoviruses from This StudyOther Genomoviruses
QueryGenomeRepCPGenomeRepCP
GenusAccession% IDAccession #% IDAccession #% IDAccession #% IDAccession #% IDAccession #% IDAccession #
GemykolovirusMK57020965.21MK57021369.97MK57021345.45MK57022170.08MK93937481.38MK93937449.16KT862242
MK57021365.21MK57020969.97MK57020944.44MK57022366.96MH54550171.26MK93937447.42MH545501
GemykibivirusMF37363899.82MF37364199.69MF373641100.00MF37364199.22MK24929398.46MK24929399.67MK249293
MF37363999.73MF37364199.69MF37364199.67MF37364099.31MK24929398.46MK249293100.00MK249293
MF37364096.97MF37363897.23MF373641100.00MF37364196.87MK24929397.23MK24929399.67MK249293
MF37364199.82MF37363899.69MF373638100.00MF37363899.22MK24929398.77MK24929399.67MK249293
MK57020196.78MF37363998.77MK57020399.67MF37363996.84MK24929394.98MK94737499.67MK249293
MK57020299.47MK57021499.08MK570214100.00MK57021490.70MK24926983.69MK24926998.38MK249239
MK57020394.34MK57020198.77MK57020198.36MK57020895.39MK24930095.32MK94737499.34MK249300
MK57020595.57MK57021695.08MK57021199.68MK57021597.41MK24930799.08MK24930798.70MK249239
MK57020696.55MF37363999.38MF37364199.67MK57021191.00MK94737498.77MK24929398.36MK249293
MK57020797.47MK57020897.23MK57020897.94MK57020892.44MK24929396.92MK24929397.25MK249300
MK57020897.47MK57020797.23MK57020798.36MK57020392.94MK24929396.92MK24929398.36MK249300
MK57021194.98MK57020695.08MK57020599.67MK57020692.59MK24929397.23MK24926998.03MK249293
MK57021499.47MK57020299.69MK570215100.00MK57020290.62MK24926983.38MK24926998.38MK249239
MK57021598.67MK57021499.69MK570214100.00MK57021690.35MK24926983.08MK24926999.03MK249239
MK57021695.57MK57020594.77MF373639100.00MK57021594.14MK24930794.46MK24923699.03MK249239
GemycircularvirusMK57020481.28MK57021293.99MK57021261.76MK57021073.52MK94737282.53MK94737270.10MG641202
MK57021067.50MK57020458.66MK57021261.76MK57020476.02MK93938492.42MK93938475.40JQ412056
MK57021281.28MK57020493.99MK57020459.28MK57021874.28MK94737282.83MK94737269.81MG571096
MK57021797.56MK57022294.80MK57022299.02MK57022268.13KM82174776.76KJ54763850.00KM510192
MK57021880.78MK57021287.39MK57020459.28MK57021274.01MK94737280.72MK94737263.61MG571100
MK57021997.12MK57022098.78MK57022099.03MK57022068.06KM82174774.31KJ54763853.87MK939446
MK57022097.12MK57021998.78MK57021999.03MK57021967.97MK93943274.01KJ54763855.08MK939446
MK57022166.65MK57020452.00MK57021050.33MK57020465.90MK93938466.03MG64119758.36KT732806
MK57022297.56MK57021796.94MK57021999.02MK57021767.91KM82174774.62KM82174750.43KM510192
MK57022361.64MK57021050.76MK57021045.54MK57020482.49MG57108794.22MG57108770.39KF413620
Table 3. Summary of recombination events identified in the genomes of gemycircularviruses and gemykibiviruses from this study. Major and minor parents indicate sequences (GenBank accession # provided) related to parental sequences that respectively donated the larger and smaller regions of the recombinant genome. For each event the recombination detection method with the most significant associated p-value is indicated in bold. Recombination detection methods: RDP (R), GENCONV (G), BOOTSCAN (B), MAXCHI (M), CHIMERA (C), SISCAN (S) and 3SEQ (T). Sites where the actual breakpoint is undetermined are marked with *.
Table 3. Summary of recombination events identified in the genomes of gemycircularviruses and gemykibiviruses from this study. Major and minor parents indicate sequences (GenBank accession # provided) related to parental sequences that respectively donated the larger and smaller regions of the recombinant genome. For each event the recombination detection method with the most significant associated p-value is indicated in bold. Recombination detection methods: RDP (R), GENCONV (G), BOOTSCAN (B), MAXCHI (M), CHIMERA (C), SISCAN (S) and 3SEQ (T). Sites where the actual breakpoint is undetermined are marked with *.
EventBeginEndRecombinant Sequence(s)Minor Parental Sequence(s)Major Parental Sequence(s)p-ValueMethod
I1901017MK570210UnknownMH9393842.71 × 10−25MCS
II249 *1050MK570204, MK570212, MK570218UnknownMK5702046.55 × 10−16GBMCST
III243 *1009MK570218UnknownMK5702045.10 × 10−14GBMCT
IV1539 *1591MK570222, MK570217MF173067MK570220, MK5702197.12 × 10−8RGBT
V231144MK570202, MK570214, MK570215MK249302, MK249239, MK249243, MK249246, MK249252, MK249256, MK249268, MK249269, MK249275, MK249287, MK249288, MK249307, MK570205, MK570216Unknown1.67 × 10−52RGBMCST
VI12752192MK570211MK249239, MK249243, MK249246, MK249252, MK249256, MK249268, MK249269, MK249275, MK249287, MK249288, MK249302, MK249307MF373638, MF373639, MF373640, MF373641, MK249293, MK5702063.64 × 10−86GBMCST
VII3331113MF373638, MF373639, MF373640, MF373641, MK570201, MK570203, MK570206, MK570207, MK570208, MK570211MK483084MH973737, MH973738, MH973739, MH973740, MK249236, MK249241, MK249251, MK249255, MK249271, MK249276, MK249279, MK249286, MK2493017.69 × 10−30GBMCST
VIII16431958MK570216MH973737, MH973738, MH973739, MH973740, MK249236, MK249241, MK249251, MK249255, MK249271, MK249276, MK249279, MK249286, MK249293, MK249301, MK570206MK249239, MK249243, MK249246, MK249252, MK249256, MK249268, MK249269, MK249275, MK249287, MK249288, MK249302, MK249307, MK5702055.32 × 10−25RGBMCST
IX19932211MK570205, MK570216MF373638, MF373639, MF373640, MF373641, MH973737, MH973738, MH973739, MH973740, MK249236, MK249241, MK249251, MK249255, MK249271, MK249276, MK249279, MK249286, MK249293, MK249296, MK249301, MK570206, MK947374MK249239, MK249243, MK249246, MK249252, MK249256, MK249268, MK249269, MK249275, MK249287, MK249288, MK2493027.83 × 10−22GBMCST
Table 4. Pairwise identities (showing highest identity) of the replication associated protein (Rep) and capsid protein (CP) amino acid sequences of the unclassified CRESS DNA viruses and circular molecules identified in this study against themselves and against those available in GenBank.
Table 4. Pairwise identities (showing highest identity) of the replication associated protein (Rep) and capsid protein (CP) amino acid sequences of the unclassified CRESS DNA viruses and circular molecules identified in this study against themselves and against those available in GenBank.
Unclassified CRESS DNA Viruses from This StudyOther Unclassified CRESS DNA Viruses
QueryRepCPRepCP
Virus Group/MoleculesAccession #Accession #%IDAccession #%IDAccession #%IDAccession #%ID
Unclassified CRESS DNA virusMF373642MK85825429.76 - -KY48783342.00--
MK858252MK858253100.00MK85825399.00MK57018278.42MK85825438.68
MK858253MK858252100.00MK85825399.00MK57018278.42MK85825438.68
MK858254MK85825281.16MK85825539.16MK57018275.99MK57018239.25
MK858255MK85825279.03MK85825439.25MK57018283.57MK57016553.64
MK858256MK85825565.75MK85825733.47MK57018264.44MK57016833.46
MK858257MK85825656.23MK85825425.21MK57016554.08MK57016842.45
MK858258MK85826035.57MK85826152.65MK57017045.86--
MK858262MK85826530.80 MH61667646.77KY30286928.12
MK858265MK85826230.80--KM82175130.80--
Circular moleculeMK858259MK85826099.70--MF11816749.34--
MK858260MK85825999.70--MF11816749.34--
MK858261--MK85825852.65----
MK858263MK85826472.53--MH61667647.17--
MK858264MK85826372.53--MH61667647.17--
Back to TopTop