Diversity and Evolution of Viral Pathogen Community in Cave Nectar Bats (Eonycteris spelaea)

Bats are unique mammals, exhibit distinctive life history traits and have unique immunological approaches to suppression of viral diseases upon infection. High-throughput next-generation sequencing has been used in characterizing the virome of different bat species. The cave nectar bat, Eonycteris spelaea, has a broad geographical range across Southeast Asia, India and southern China, however, little is known about their involvement in virus transmission. Here we investigate the diversity and abundance of viral communities from a colony of Eonycteris spelaea residing in Singapore. Our results detected 47 and 22 different virus families from bat fecal and urine samples, respectively. Among these, we identify a large number of virus families including Adenoviridae, Flaviviridae, Reoviridae, Papillomaviridae, Paramyxoviridae, Parvoviridae, Picornaviridae, and Polyomaviridae. In most cases, viral sequences from Eonycteris spelaea are genetically related to a group of bat viruses from other bat genera (e.g., Eidolon, Miniopterus, Rhinolophus and Rousettus). The results of this study improve our knowledge of the host range, spread and evolution of several important viral pathogens. More significantly, our findings provide a baseline to study the temporal patterns of virus shedding and how they correlate with bat phenological trends.


Introduction
The advent of next generation sequencing (NGS) technologies has drastically increased the discovery of novel viruses and estimates of virus diversity [1,2]. Though family-level specific primers are often used to screen diagnostic samples, they are designed based on available reference sequences and typically target the most conserved genetic region such as the polymerase genes [3][4][5]. These assays often lack sensitivity at the expense of detecting an entire family of viruses. Next generation sequencing can detect viruses at low concentrations and often provides sequence reads from across the entire genome, providing sites for primer walking and gap closing [6][7][8]. This approach can also detect divergent lineages that may not be amplified using traditional polymerase chain reaction (PCR) approaches [9]. However, as host and bacterial components can dominate sequencing reads, virus reads of interest tend to be in low abundance in these data sets [10].
NGS has been employed to detect zoonotic pathogens and in numerous cases of virus discovery and metavirome descriptions, including bats in China, Myanmar, New Zealand and North America [11][12][13][14][15]. This technique has also detected novel poxviruses and adenoviruses, divergent papillomaviruses, and unique paramyxoviruses [16][17][18]. More recently, genetic regions from a novel filovirus were identified from Rousettus bat in China in 2015, indicating the utility of this approach to ascertain the presence of potentially pathogenic viruses in reservoir hosts [19].
Increasingly, these studies have focused on bats because this group of animals are unique virus reservoirs. Bats are distinctive mammals, having ecological, immunological and behavioral attributes that set them apart from other orders. Bats are exceptionally speciose, comprising 20% of all mammalian species and are the only mammals that are capable of true flight [20]. Many species are gregarious and roost in large colonies, which can number over one million individuals [21]. They are relatively long-lived for their body size and temperate species often undergo torpor or hibernation [22]. There are several theories regarding why bats are exceptional viral reservoirs and rarely experience pathogenesis with infection. One is that they have to deal with the physiological stress of flight, with increased metabolic rates and a subsequent increase in reactive oxygen species [23,24]. Recent research demonstrated that the unique innate immune system of bats may allow them to co-exist with viruses, maintaining very low levels of viremia or keeping viruses in a quiescent state [25][26][27].
Eonycteris spelaea is a nectivorous bat with a distribution ranging from the Malay-Indonesian archipelago to southern China, extending west into the Indian subcontinent [28]. These cave-roosting bats are especially important pollinators of durian [29]. There are two known colonies of E. spelaea in Singapore and both populations are restricted to roosting under bridges. This species forage at distances greater than 30 km from their roosting site, but are threatened across their range by habitat loss and hunting for human consumption [30]. Here we perform next generation sequencing on pooled feces and pooled urine samples collected from one colony of E. spelaea to identify the viral diversity and to examine the difference in virus communities between fecal and urine samples. Furthermore, we conducted phylogenetic analysis to understand the evolutionary relationships of different families of viruses detected from this study.

Sample Collection for NGS Library Preparation
Urine and fecal samples were collected from 3 time points from a colony of the cave nectar bats (Eonycteris spelaea) in Singapore for NGS. Feces were collected on 14 March, 28 March and 11 April 2013 while urine was collected on 24 April, 8 May and 20 May 2014. Disposable plastic drop cloths were placed under the colony and approximately 25 g of feces was collected and placed in a 50-mL tube. Urine samples from approximately 100 bats was placed into viral transport media (penicillin, streptomycin, polymyxin B, gentamicin, nystatin, olfoxacin, sulfamethoxazole). Fecal material was prepared for library preparation as previously described [31] while urine was centrifuged at 10,000 × g for 3 min to pellet debris. RNA-zol was added to urine viral supernatant and RNA extracted using Direct-zol™ RNA MiniPrep (Zymo Research Corporation, Irvine, CA, USA), then subjected to in-column DNase I digestion (New England BioLabs Inc., Ipswich, MA, USA). Extracted and DNase I digested RNA was treated with Ribo-Zero™ Gold rRNA Removal Kit (Epidemiology) (Epicentre, Madison, WI, USA) and 500 bp cDNA libraries were constructed using NEBNext ® Ultra™ Directional RNA Library Prep Kit for Illumina ® (New England BioLabs Inc., Ipswich, MA, USA) and visualized on 1.5% agarose gel, before being excised and purified using Zymoclean Gel DNA Recovery Kit (Zymo Research Corporation, Irvine, CA, USA). All sequencing reactions were run at the Duke-NUS Genome Biology Facility. Libraries from urine were run on an Illumina MiSeq machine with paired ends and a read length of 2 × 250 bp, while fecal sample libraries were processed as described previously and run on an Illumina HiSeq 2000 with paired ends and a read length of 2 × 76bp.

NGS Data Analysis
All FASTQ files were assessed using FastQC to assess overall quality [32]. Trimming was executed using Trimmomatic-0.3.2 to remove adapters, low quality bases (Q = 20 with a sliding window 4) and reads with fewer than 50 bp length [33]. Taxonomic read classification was performed with DIAMOND sequence similarity searches against a local National Center for Biotechnology Information (NCBI) nr (non-redundant) protein database [34]. DIAMOND outputs were imported into MEGAN6 for the taxonomic binning of reads and visualizing the distribution of virus family reads [35]. X174 phage reads were removed from the final data set as these are spiked-in as control for the next generation sequencing reaction. Virus family reads were exported for phylogenetic analysis. To confirm whether the reads sorted by MEGAN were true positives, these were verified as viral hits using the BLASTX tool from the National Center of Biotechnology Information.

Sample Collection and PCR Assays for Detection of Specific Viruses
Individual urine samples were collected from 59 time points and pooled by date (2014-03-27 to 2016-09-01). Pooled Eonycteris urine samples were centrifuged at 10,000 × g for 1 min. RNA was extracted from the supernatant using QIAamp ® Viral RNA Mini Kit (#52906; Qiagen Duesseldorf, Germany) and cDNA made with Random Hexamers and SuperScript ® II Reverse Transcriptase (#18064-014, Invitrogen, CA, USA). This was used to screen for orthoreovirus and paramyxovirus RNA-dependent RNA polymerase (RdRp) gene with family-specific primers [5,36]. PCR products were visualized on a 1.5% agarose gel, and 500 bp bands from the paramyxovirus assay and 240 bp bands from the orthoreovirus assay were excised, gel purified with Qiagen QIAquick Gel Purification kit and sent for Sanger sequencing with both forward and reverse reads sequenced (1 st Base DNA Sequencing Services, Axil Scientific Pte Ltd., Singapore).

Phylogenetic Analysis
Candidate reads from mammalian-specific viruses were de novo assembled in Geneious 7.1.6 (Biomatters Ltd., Auckland, New Zealand) and consensus sequences were used for subsequent phylogenetic analysis [37]. Representative nucleotide sequences specific to the gene of interest were downloaded from the NCBI GenBank for the following virus families (Table 1): Adenoviridae, Flaviviridae, Reoviridae, Papillomaviridae, Paramyxoviridae, Parvoviridae, Picornaviridae, and Polyomaviridae. For each viral family, individual sequence data sets were aligned using Transalign [38] and MAFFT [39] followed by manual curation of alignments. Gene phylogenies were initially reconstructed using FastTree [40], and the final data sets were further down sampled to reduce redundant and similar sequences. Altogether, 15 individual data sets were analyzed based on the following viral genes: adenovirus polymerase, flavivirus envelope, flavivirus NS5, paramyxovirus nucleoprotein, paramyxovirus polymerase, parvovirus VP1, parvovirus VP2, picornavirus 3D, picornavirus polyprotein, orthoreovirus M2, orthoreovirus L2, rotavirus VP1, rotavirus VP7, polyomavirus VP2, and 36 papillomavirus E1 (Table 1). Individual gene phylogenies were reconstructed using RAxML [41] with a general time reversible model and robustness of the nodes was assessed using 1000 bootstrap replicates.

Next Generation Sequencing Analysis
Four NGS data sets generated from the pooled urine and fecal samples: two libraries (Urine-25 and Urine-27) were constructed from pooled urine that were run on Illumina MiSeq, while two other libraries (Feces-MiSeq and Feces-HiSeq) from pooled feces were sequenced using Illumina MiSeq and HiSeq, respectively. For urine samples, a total of 5,126,632 reads from Urine-25 and 13,421,263 reads from Urine-27 were generated, and approximately 29.8% and 30.8% of respective reads could be assigned to known sequences in the GenBank nr database by DIAMOND (Table 2). For the fecal data set, a total of 68,584,413 reads from the Feces-HiSeq and 4,952,973 reads from the Feces-MiSeq were generated, but the percentage of assigned reads from the data sets differed dramatically (5.8% from Feces-HiSeq and 75.7% from Feces-MiSeq). The majority of assigned reads in the fecal and urine samples were from Eukaryota. Reads assigned to bacteria amounted to between 11.2-14.6% of the four NGS data sets. Fungal sequences were rare in the urine data sets, but common in the fecal data sets, comprising 16.9% of assigned reads in Feces-HiSeq and 35.4% of assigned reads in Feces-MiSeq. Viral reads were a relatively small component, totaling approximately 1% of all assigned reads from the four data sets. All NGS data sets are available at the National Center for Biotechnology Information Sequence Read Archive under bioProject PRJNA524946.

Adenoviridae
Of the 360 total adenovirus reads from the fecal and urine data sets, there were 39 contigs assembled with 17 reads unassembled. The longest contig was 539 bp, while the shortest was 115 bp. A 504 bp contig was similar to the mamadenovirus polymerase genes. The polymerase phylogeny ( Figure S1) of adenovirus clearly indicates Eonycteris adenovirus is well nested within a monophyletic clade (72% bootstrap support, BS) of bat adenovirus from Rousettus and Minopterus species. The new adenovirus sequence from Singapore (MK603133) is most closely related to two Rousettus leschenaultii adenovirus sequences (KX961095 and KX961096) from China in 2013. They share a high level of nucleotide (nt: 86.6%) and amino acid (aa: 99.4%) similarities.

Flaviviridae
Two contigs were assembled from 62 flavivirus reads from the fecal and urine data sets. These were mapped to an envelope (E) protein and a NS5 protein, which encodes methyltransferase and RNA-dependent RNA polymerase, respectively. The E phylogeny ( Figure S2) of Flavivirus indicates that the Eonycteris sequence (MK603134) closely resembles Phnom Penh virus from a Cynopterus bat (NC_034007), with 88.4% nucleotide (nt) similarity and 92.7% amino acid (aa) similarity. They formed a strongly supported monophyletic clade (100% BS) with Batu Cave virus from Malaysia (KJ469370). Similar observations are found in the NS5 phylogeny ( Figure S2), although the Eonycteris sequence (MK603135) appears to be more closely related to Batu Cave virus (58% BS; nt: 92.5%, aa: 100% similarity).

Reoviridae
The 172 reovirus reads from the fecal data set were assembled into 41 contigs with 23 unassembled reads. The longest contig was 487 bp, while the shortest was 76 bp. There were members of the genera Orthoreovirus and Rotavirus in the data set. Four contigs were selected for phylogenetic reconstruction.

Papillomaviridae
There were 12 papillomavirus reads from the fecal data set and 9 were assembled to produce 3 contigs (L1 protein and two E1 protein contigs), with a maximum length of 635 bp and a minimum length of 77 bp. One contig (439 bp) was similar to the V3 gene which encodes the minor capsid protein. The papillomavirus E1 phylogeny ( Figure S3) indicates the Eonycteris sequence (MK603138) is most similar to an Eidolon helvum papillomavirus from Cameroon (KX276956). These two sequences have 68.4% aa similarity and 69% nt similarity. These two sequences were in a monophyletic group with another Eidolon helvum papillomavirus from Cameroon (KX276957) (100% BS).

Paramyxoviridae
There were 57 reads from the fecal and urine data sets were classified as from Paramyxoviridae of which 56 were assembled to produce 6 contigs. The longest contig was 338 bp and the shortest was 127 bp and there were sequences from the polymerase gene (L) and nucleocapsid (NP).

Parvoviridae
There were 565 Parvovirinae reads from the fecal data sets. These produced 43 contigs with a maximum length of 891 bp and a minimum length of 120 bp. Two contigs matched with the capsid gene (VP1 and VP2) of parvovirus. The VP1 contig (321 bp) from Eonycteris (MK603143) is related to a Rousettus leschenaultii parvovirus from in China (100% BS) (MF682925), and they shared a 73.5% nt similarity and 76.6% aa similarity. These two sequences also formed a strongly supported monophyletic clade with porcine bocaviruses (97% BS) (Figure 4). Similar to the VP1 gene, the VP2 sequence (MK603144) (319 bp) from Eonycteris is sister to several porcine bocaviruses, forming a well-supported clade (87% BS) (Figure 4).

Picornaviridae
There were 159 reads from the fecal data sets classified in Picornaviridae. These created 33 contigs, the longest at 724 bp and the shortest 76 bp. One contig was from the polyprotein region and picornavirus phylogeny ( Figure S4) indicates the Eonycteris sequence (MK603145) was sister to a monophyletic clade (100% BS) comprising three bat picornaviruses (KJ641693, HQ595345, NC_015934) from Rhinolophus hipposideros, R. sinicus and Hipposideros armiger from China and Hong Kong. These shared a 68.4% nt similarity and a 51.7% aa similarity. Another read matched the 3D gene that encodes for RNA polymerase. The 3D gene from Eonycteris (MK603146) ( Figure S4) was related to bat picornavirus from Vespertilio superans and Myotis altarium from China (72.8% nt and 79.2% aa similarity), although this node was not statistically supported.

Additional Viruses Detected
In addition to the above-mentioned viruses identified from fecal and urine samples of Eonycteris bats in Singapore, there were other virus families with a low number of reads (<15). For instance, one read (114 bp) was identified as a bunyavirus sequence that is most similar to the L protein of phlebovirus found in blacklegged ticks. Three reads corresponded to Taterapox virus (Poxviridae), originally isolated from an African gerbil and 15 reads for herpesviruses. Phylogenetic relationships of these viral pathogens were not reconstructed due to a lack of reference sequences in GenBank.
We reconstructed individual gene phylogenies for each of the above-mentioned virus families. In most phylogenies, our novel viral sequences of Eonycteris spelaea are clustered within a group of bat viruses from other genera, often from the same family (Pteropodidae), indicating circulation of these viruses among different bat species. Detection of novel viruses may not provide direct information on zoonotic capacity, but the generated sequence data can allow us to better understand the evolutionary history of these virus families and infer potential cross-species transmission [77]. For instance, our VP1 phylogeny of parvovirus indicates that bat parvoviruses are sister to murine and porcine parvoviruses, whereas the VP2 phylogeny shows bat parvoviruses are closely related to canine, feline and porcine parvoviruses. It is apparent that parvovirus is capable of infecting a broad host species, although further research is needed to understand how the virus jumps to different hosts.
Viruses exhibit specific tissue tropisms based on available cellular receptors and compatibility of the intracellular environment. Detection of these viruses depends on what tissue or sample type is being screened. Previous studies have indicated which viruses are more likely to be shed in feces (adenoviruses, astroviruses, parvoviruses, picornaviruses), urine (paramyxovirues), or tissues [60,[78][79][80][81]. Receptors can be widely available, such as in paramyxovirus (CD46 in measles and sialic acid in Sendai virus) and picornavirus infections (ICAM-1 in Coxsackie virus), or narrowly restricted in adenovirus infections (integrin on monocytes) [82]. Due to limited commercial reagents and the difficulty of maintaining experimental colonies, little work has been done to characterize receptors and viral tissue preference in bats, though progress is being made in determining coronavirus tropism [83][84][85].
Consensus family level primers are used globally for virus biosurveillance and amplify the most conserved genomic region, usually the polymerase, but these primers may be of limited utility in detecting divergent strains [86]. Unbiased NGS reads will capture reads scattered across the virus genome, depending on the starting quantity of the sample and virus [10]. As sequencing reactions generate reads from all nucleic acids, reads can end up binned as unknown because there are no similar reads available in reference data sets (GenBank nucleotide and non-redundant RefSeq proteins). To minimize having unassigned reads, our approach used the program DIAMOND and the nr data set to only assign sequences that have homology to coding regions, eliminating false assignments of ribosomal RNA [34]. Our NGS sequencing of bat fecal and urine samples resulted in a low proportion of viral reads. This is common in metagenomic data sets where the majority of assigned reads were from the host, bacteria and viruses that infect plants, bacteria and insects [14]. These large data sets often provide low coverage across the genomes of viruses due to the host and bacterial background [87]. Interestingly, in our study, more viral reads were generated from the MiSeq Illumina than the HiSeq Illumina. Interestingly, retrovirus reads were much more common in the urine data set compared to the fecal data set. This may be caused by the comparatively low background in the urine.
In this study, we characterized the fecal and urine virome of Eonycteris spelaea, an ecologically important species [30]. The NGS data sets provided sequencing reads for 10 families of viruses that are known to infect mammals, providing segments to develop strain-specific assays to detect and quantify these viruses for future longitudinal studies. Recent research has demonstrated that co-roosting and colony size may be important in the generation of novel variants and in viral maintenance [88,89]. As this species roosts in large numbers, co-roosts with several other species of bats, and is widely distributed, it may be a candidate species to understand if virus diversity is based on phylogenetic relatedness, co-roosting partners or geographic separation. Additionally, with the sequencing of the full genome and the presence of a captive colony of bats at Duke-NUS Medical School, this will expand our capacity in understanding the genomics, infection and immunology of host-virus interactions using E. spelaea as the studied species [90].
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/11/3/ 250/s1, Figure S1: Phylogenetic relationships of the V4 gene sequences of adenovirus, inferred by using the maximum-likelihood method with the GTR+GAMMA model in RAxML; Figure S2: Phylogenetic relationships of the E and NS5 gene sequences of flavivirus, inferred by using the maximum-likelihood method with the GTR+GAMMA model in RAxML; Figure S3: Phylogenetic relationships of the E1 gene sequences of papillomavirus, inferred by using the maximum-likelihood method with the GTR+GAMMA model in RAxML; Figure S4: Phylogenetic relationships of the polyprotein and 3D gene sequences of picomavirus, inferred by using the maximum-likelihood method with the GTR+GAMMA model in RAxML; Figure S5: Phylogenetic relationships of the VP2 sequences of polyomavirus, inferred by using the maximum-likelihood method with the GTR+GAMMA model in RAxML. Disease, National Institutes of Health, US Department of Health and Human Services, USA. We would like to thank KL Okihara for her assistance in lab work and D. Zaini and X. Song for field collections.