Aedes aegypti from Amazon Basin Harbor High Diversity of Novel Viral Species

Viruses are the most diverse and abundant microorganisms on earth, highly adaptive to a wide range of hosts. Viral diversity within invertebrate hosts has gained notoriety in recent years in public health as several such viruses have been of medical importance. Aedes aegypti serves as a vector for several viruses that have caused epidemics within the last year throughout Brazil; including Dengue, Zika and Chikungunya. This study aimed to identify new viral agents within Aedes aegypti mosquito in a city of the Amazonian region, where it is highly endemic. Metagenomic investigation was performed on 60 mosquito pools and viral RNA sequences present in their microbiota were characterized using genomic and phylogenetic tools. In total, we identified five putative novel virus species related to the Sobemovirus genus, Iflavirus genus and Permutatetraviridae family. These findings indicate a diverse taxonomy of viruses present in the mosquito microbiota of the Amazon, the region with the greatest invertebrate diversity in the world.


Introduction
Next-generation sequencing (NGS) is a revolutionary tool in molecular biology research. Courtesy of NGS, great numbers of insect microbiota have been explored, allowing the discovery of novel microorganisms, especially viruses [1][2][3]. Although the collective insect microbiota harbors many human pathogenic viruses, most viruses are non-pathogenic and have no direct public health impact.

Mosquitoes Collection
Mosquitoes (Diptera: Culicidae) were collected from city of Macapá, Amapá state, North Brazil (see Figure S1), twice a month from January to March 2017. Electric manual aspirators and entomological nets were used to collect the mosquitoes. The mosquitoes were then transported to the laboratory, euthanized with ethyl acetate and morphologically identified using the dichotomous keys of Consoli and Lourenço-de-Oliveira [19] legs and wings removed. Between one and five females were grouped in pools according to their taxonomic category, place and date of collection. In total, 60 pools of mosquitoes were stored in a −80 • C freezer.

Sample Processing and Next Generation Sequencing (NGS)
The following metagenomics deep sequencing protocol was used. Initially, each mosquito pool was homogenized in 2 mL impact-resistant tube containing lysing matrix C (MP Biomedicals, USA) added to 900 µL of Hanks' buffered salt solution (HBSS). The homogenized sample was centrifuged at 12,000× g for 10 min and approximately 300 µL of the supernatant was then filtrated through a 0.45 µm filter (Merck Millipore, Billerica, MA, USA). Next, 100 µL of cold PEG-it Virus Precipitation Solution (System Biosciences, CA, USA) was added to the obtained filtrate, mixed and incubated at 4 • C for 24 h. After, the mixture was centrifuged at 10,000× g for 30 min at 4 • C and supernatant discarded. The pellet rich in viral particles was treated with a mix of nuclease enzymes to digest unprotected nucleic acids. Viral nucleic acids were obtained using ZR & ZR-96 Viral DNA/RNA Kit (Zymo Research, CA, USA) according to the manufacturer's protocol. The cDNA synthesis was conducted with AMV Reverse transcription (Promega, WI, USA). A second strand of cDNA synthesis was conducted using DNA Polymerase I Large Fragment (Promega, WI, USA). Then, DNA library was performed using Nextera XT Sample Preparation Kit (Illumina, CA, USA). The library was deep-sequenced using the HiSeq 2500 Sequencer (Illumina, CA, USA) with 126 bp ends. Bioinformatic analysis was performed according to the protocol previously described by Deng et al. [20]. The singlets and contigs were analyzed via BLAST (BLASTn and BLASTx) to look for similarity to viral sequence in GenBank's Virus.

Phylogeny and Viral Annotation
Firstly, viral sequences identified in this study were used to query against NCBI protein database using the BLASTp tool to determine the closest sequences, its taxonomic classification and similarity. Secondly, based on BLAST result, the best hit sequences were download and aligned using Mafft software online [21] and phylogenetic trees were constructed using PhyML software [22] by Maximum Likelihood approach. Branch support values were assessed using the approximate likelihood ratio test (aLRT) on a Shimodaira-Hasegawa-like test. Evolutionary models and gamma distribution were selected according to the Bayesian information criterion (BIC) implemented in the jModeltest software [23]. Thirdly, the ORFs were annotated using InterProScan [24] and CD-search web using the CCD 3030 database and e-value < 0.05 [25].

Results
A total of 60 pools of Ae. aegypti female was collected (each pool containing between 1 and 5 specimens of mosquitoes, see the locations of sampling in the Figure S1 and characteristics of these pools in the Table S1), of which 24 were from Central and 36 were from Marabaixo and subsequently submitted to NGS protocol. Raw data were processed and after assembly the viral sequences were identified based on similarity of BLASTX comparison against to all RefSeq database in GenBank (details of viral richness in pools of pools contained viruses describe in this study were summarized in the Figures S4-S9, Supplementary Materials). We found nine virus-like sequences in two samples (AP59 and AP60) with <90% amino acid identity to different unclassified viruses, related to Sobemo-like virus, Iflavi-like virus and Permutotetra-like virus. These sequences represent five putative novel viruses, named Aedes Sobemo-like virus, Aedes Iflavi-like virus 1, Aedes Iflavi-like virus 2, Aedes permutotetra-like virus 1 and Aedes permutotetra-like virus 2 ( Table 1). All sequences generated in this study were deposited in the GenBank (GenbBank acession: MT808014-MT808054) and they are also available in a fasta formatted file (Supplementary Materials; File S1).

Sobemo-Related Virus
Two Sobemo-like virus sequences (3000 and 2768 nt) were identified in two pools, tentatively named Aedes Sobemo-like virus (ASLV). Sobemo-like virus belongs to an unclassified group distantly related to Viruses 2020, 12, 866 4 of 12 the International Committee on Taxonomy of viruses (ICTV) Sobemovirus genus and Luteoviridae family. Sobemo-like viruses are widely found in insects and have a particular genomic organization; many of these novel viruses have (bi) segmented genomes, different to Sobemovirus genus and Luteoviridae family viruses which are monopartite [1]. We characterized two complete segment 1s of ASLV, which possesses two open read-frame (ORFs) corresponding to a putative peptidase and RNA-dependent RNA-polymarase (RdRp), respectively (Figure 1a). The ASLV RdRp gene encodes a 450-aa protein with amino acid identity varying between 72% and 83% with hypothetical protein 2 of Wenzhou Sobemo-like virus 4, the closest aligned sequence in the blastx, while the capsid shares 36-54% amino acid identity with same virus ( Table 1). The ASLV has a typical overlapping reading frame, −1 frameshift and a protein layout similar to that of other known sobemoviruses ( Figure 1a).

Iflavi-Related Virus
One contig corresponding to the helicase gene and two contigs corresponding to the capsid and RdRp genes of two putative novel viruses; designated Aedes Iflavi-like virus 1 (AILV 1) and Aedes Iflavi-  In addition, eighteen viral genomes with complete coding regions similar to Guadeloupe Mosquito virus (GMV) were obtained from several analyzed pools. The viral genomes contain two segments, encoding a putative peptidase and RdRp protein on segment 1 (2400-3000 nt) ( Figure 1a) and a putative capsid and one hypothetical protein encoded by segment 2 (1000-1800 nt) (data not shown). Similarly to ASLV, GMV, Wenzhou sobemo-like virus 4 and Hubei mosquito virus 2 are all currently unclassified viruses with a distant relationship to the Luteoviridae family and Sobemovirus genus [26]. Phylogenetic analysis based on segment 1 indicates that GMV Brazilian sequences are highly similar to GMV, recently detected in Guadeloupe, and Renna virus isolated from Mexico City, sharing about 100% nucleotide sequence identity, clustered into a unique clade (Figure 1b).

Iflavi-Related Virus
One contig corresponding to the helicase gene and two contigs corresponding to the capsid and RdRp genes of two putative novel viruses; designated Aedes Iflavi-like virus 1 (AILV 1) and Aedes Iflavi-like virus 2 (AILV 2), were found in two mosquitoes pools (Figure 2a). All three fragments have low amino acid identity (<56%) with Yongsan picorna-like virus 1, the best hit with blastx (Table 1). This low identity is reflected in the topology of RdRp-based phylogeny, with AILV 1 and AILV 2 grouped in a separated clade from other Iflavi-like viruses (Figure 2b). The similar topologies were observed in the helicase and capsid-based ML phylogeny (Supplementary Materials). The phylogeny of best hits on blastx and Iflavirus members show diversity of Iflavi-like viruses, with all Iflavi-like and Iflavirus members previously isolated from arthropods, mostly insects [27]. AILV strains grouped into a cluster which shares a common ancestor with other viruses originally described in mosquitoes, dismembered from other insect-viruses (Figure 2b), however, only Yongsan picorna-like virus 1, AILV 1 and AILV 2 have been found in Aedes mosquitoes (unpublished). Since we used NGS to amplify viral sequences it is possible that contigs of ALV1 were derived from distinct viral genomes, likewise contigs of ALV2 also may have been amplified from distinct genomes. Nevertheless the phylogenetic analysis showed ALV1 and ALV2 are not the same virus and they likely represent new species.

Permutotetra-Like Virus
We found three partial genomic segments from two putative novel Permutotetra-like viruses ( Figure 3). A RdRp sequence (3321 nt) presented 53% of amino acid identity with Culex Daeseongdong-like virus, the most similar virus. This putative novel virus was named Aedes permutotetra-like virus 1 (APLV1) (Figure 3a). Another two capsid sequences (882 and 1219 nt) belonging to this group shared~48% amino acid identity with the most similar virus, Sarawak virus. This putative novel virus was named Aedes permutotetra-like virus 2 (APLV2) (Figure 3b). The cluster formed by APLV-1, Culex Daeseongdong-like, Daeseongdong virus 2 and Smothfield permutotetra-like virus have been found in mosquitoes (Figure 3c) [3,28]; similarly, the clade formed by APLV-2, Culex permutotetra virus, Shinobi tetravirus and Sarawak virus have also been detected in different mosquito species (Figure 3d) [29][30][31]. plify viral sequences it is possible that contigs of ALV1 were derived from distinct viral genomes, likewise contigs of ALV2 also may have been amplified from distinct genomes. Nevertheless the phylogenetic analysis showed ALV1 and ALV2 are not the same virus and they likely represent new species.

Discussion
In this study, we analyzed 60 pools of Ae. aegypti, and, as expected, we found several highly divergent sequences, which possibly represent novel viral species. These viruses belong to the Luteo-sobemo-related virus, Iflavirus and Alphapermutotetravirus genus.
One novel Luteo-sobemo related virus was found in two Ae. aegypti samples, named Aedes sobemo-like virus (ASLV). The sobemo-like viruses are (+) ssRNA unclassified viruses distantly related to the Sobemovirus genus and Luteoviridae family (Figure 1b), which infects plants and is known to be vectored by arthropods. Although viruses belonging to Sobemovirus genus and Luteoviridae family are known plant viruses and are of monopartite genome, Sobemo-like virus members have bi-segmented genomes and have been isolated primarily from insects [1,26]. It has been speculated that this group of viruses should be proposed as a new family [32].
Iflavirus members are a new recognized family called Iflaviridae (order Picornavirales), under the Iflavirus genus. All Iflavirus members are insect-infecting viruses and they have been identified in a wide range of hosts belonging to the class Insecta, although plant-infecting Iflavirus-like virus has been reported from tomato (Solanum lycopersicum) [33]. Currently, there are fifteen species in the genus Iflavirus recognized in the last report of ICTV [34]. However, sequence identity at the amino acid level of the capsid proteins above 90% is used for species demarcation criteria for the Iflavirus genus and several tentative novel viruses have been identified showing sequence similarity to members of the genus Iflavirus and yet are classified as iflavi-like viruses. Through NGS analysis, we assembled five contigs that showed similarity to Iflavi-like viruses, here named Aedes Iflavi-like virus 1 (AILV1) and Aedes Iflavi-like virus 2 (AILV2). For this reason, there is a possibility of these contigs belong to distinct viral genomes. Our analysis shows that ALV1 and ALV2 share a common ancestor that diverge from each other and likely represent two new viral species. BLASTp searches showed that both AILV1 and AILV2 shared low sequence identity (less than 90%) with other members of Iflaviridae at the amino acid level ( Table 1), indicating that both are novel species of Iflaviridae family [34]. Additionally, sequence analysis showed that the AILV1 and AILV2 shared 50% capsid amino acid sequence identity (data not shown), suggesting that they are members of the different species. The phylogenetic analysis showed that AILV1 and AILV2 form a well-supported clade, suggesting the representation of a novel clade within the Iflaviridae family. According to ICTV, "The Iflaviridae family is expanding rapidly and will likely undergo revision in the near future" and possibly new species and genus will be included in the official taxonomy.
Additionally, we also detected three partial genomes of two putative novel viruses (APLV 1 and APLV2) closely related to unclassified permutotetra-like viruses. Permutotetraviridae is a recent classified family with a single genus (Alphapermutotetravirus) and two prototype species (Euprosterna elaeasa virus and Thosea asigna virus), restricted mainly to insects in the order Lepidoptera (butterflies and moths). In recent years, a wide range of highly divergent viruses distantly related to Permutotetraviridae family has been identified [1,3,30,35]. The lack of common genomic organizations in permutotetra-like virus members and the formation of two large and well-supported clades (Figure 3) support the need to create new groups for the current unclassified viruses of this family. So, the permutotetra-like viruses (APLV1 and APLV2) found in this study may represent new species within different genus/family. Both viruses are grouped with viruses isolated only from mosquitoes, indicating a likely common origin within their respective clades (Figure 3c,d).
All novel viruses reported here share a common ancestor with other viruses originally described in mosquitoes, dismembered from other insect-viruses, suggesting a close evolution with their mosquito hosts. Recent phylogenetic studies in several RNA insect-virus families have indicated that they are ancient agents with highly distinct lineages, leading to the credence of probable co-evolution and expansion with their insect hosts [10,36,37]. The hypothesis that insect-viruses have been closely associated with their insect hosts for a long period of time is supported by studies that demonstrate vertical/transovarial transmission (TOT), whereas some become integrated into the genomes of their own arthropod hosts [38][39][40]. Another possibility regarding insect-virus evolution is of host association, whereby dual host viruses evolved from insect-specific progenitors, with many arthropod-borne viruses possibly emerging to vertebrates and plants in this way, including complete adaptation to vertebrate or plant hosts and thereby losing the need for an invertebrate host [2,41]. Importantly, none of these novel viruses are closely related to known vector-borne pathogens of humans or other mammals. The alignments used to construct maximum likelihood trees (ML) have a poor phylogenetic signal, as determined by the high proportion of star-like trees in the likelihood mapping analysis.
Although the low quality of alignments has little effect on the topology of trees constructed with other methods, it may have a significant outcome to estimates mutation rates or to the measurement of divergent-time. Consequently, in the future with the identification of new viral sequences the interpretation of data may also be affected.
The diversity of arboviruses remains to be explored, especially in the Amazon, known for being a rich bioma with many viral species. For example, the Amazon region is identified as the starting point for transmission of the yellow fever virus in a recent outbreak in Brazil that killed almost 700 people between December 2016 and March 2018 and the Amazon rainforest functions as a 'reservoir' region for several arboviruses [42]. Other studies have been identified novel virus presents in the mosquito microbiota in this region, in the host Aedes aegypti and anophelines [17,18,43], reinforcing the idea that our current knowledge about the diversity of viruses is still very limited. Furthermore, insect-specific virus (ISV) compose the majority of mosquito virome the virus-virus interactions may affect the transmission of some viral pathogens [44,45]. From description and characterization of these viral agents, we can gain knowledge useful to some biotechnological strategies in combating epidemic viruses, such as Dengue, Zika and Chikungunya.
Recent massive metagenomics studies have expanded our knowledge about diversity of a great number of invertebrate viruses, which include many unclassified groups, inclusive of luteo-sobemo-like virus, Ifla-like virus, permutotetra-like virus, among others [1][2][3]46]. The ICTV officialization of these taxonomic proposals requires time, alongside large quantities of sequence analysis and therefore, the identification of novel viral sequence within this study greatly contributes to the correct taxonomic classification of these "virus-like" sequences. Ultimately, our results highlight the importance of identifying and characterizing novel viruses to expand our understanding of the taxonomic diversity of viral groups (families, genera and species), which is currently poor. The host range influence and vector biology for these viruses, alongside the ecological and evolutionary history of Aedes aegypti microbiota, the principle arboviral vectors of Brazil, need to be further studied.