Viral Metagenomic Profiling of Croatian Bat Population Reveals Sample and Habitat Dependent Diversity

To date, the microbiome, as well as the virome of the Croatian populations of bats, was unknown. Here, we present the results of the first viral metagenomic analysis of guano, feces and saliva (oral swabs) of seven bat species (Myotis myotis, Miniopterus schreibersii, Rhinolophus ferrumequinum, Eptesicus serotinus, Myotis blythii, Myotis nattereri and Myotis emarginatus) conducted in Mediterranean and continental Croatia. Viral nucleic acids were extracted from sample pools, and analyzed using Illumina sequencing. The presence of 63 different viral families representing all seven Baltimore groups were confirmed, most commonly insect viruses likely reflecting the diet of insectivorous bats. Virome compositions of our samples were largely impacted by the sample type: invertebrate-infecting viruses were most frequently found in feces, bacterial viruses in guano, whereas vertebrate-infecting viruses were most common in swabs. Most vertebrate-infecting virus sequences were assigned to retroviruses, parvoviruses, iridoviruses, and poxviruses. We further report the complete genome sequence of a novel adeno-associated virus, densovirus and a near complete length genome sequence of a novel iflavirus. Additionally, one of the most interesting findings in this study was the difference in viromes between two contrasting habitats, the continental and Mediterranean Croatia.


Introduction
Bats are the second most species-rich taxonomic group of mammals after rodents, representing 20% of mammalian diversity. Part of the Chiroptera taxonomical order, these flying mammals inhabit all continents except Antarctica [1]. Bats play an essential role in ecosystems globally and humans benefit from their presence in many ways due to their roles for example in seed dispersal, pollination and their guano being used as an organic fertilizer [1]. Many bats, including all European species, are insectivorous and prey on several insect species that can cause high economic losses [1,2].

Figure 1.
Locations of bat sampling in continental (green) and Mediterranean (blue) Croatia. Source: [19]; permission is granted to copy, distribute and/or modify this map since it is based on free copyright.
Bats were captured and handled by bat biologists. Captures were facilitated using mist nets (Ecotone Mist Nets, Ecotone, Poland) at the entrances of the caves during night (locations 1, 4, 10 and 11) or using hand nets inside the colony dwellings during the day (locations 2, 3, 5-9). Sampling was repeatedly conducted over the time span of two consecutive years at locations 3, 4 and 11. Recapture of bats at these sites was not evaluated, because the previously captured bats were not marked. During sampling, bats were placed into cotton bags individually and species were determined by bat biologists according to morphological criteria [2]; age, body mass, forearm length, sex, and reproductive status were recorded.
Three types of samples were collected throughout this study. Oral swabs (swabs, i.e., saliva) were collected from individual bats, feces samples were collected from individual bats defecating at the time of sampling, and guano samples were collected from below the colonies where available. Swabs and feces samples were preserved in 500 μL of nucleic acid stabilization reagent (DNA/RNA Shield; Zymo Research, Irvine, CA, USA), already at the time of sampling, whereas guano was only resuspended in phosphate-buffered saline (PBS; 10% wt/vol) immediately before further processing in the laboratory.
All procedures including capture, handling of bats, as well as sample collection, were carried out in accordance with the ethical guidelines and permit delivered by the Croatian State Institute for  [19]; permission is granted to copy, distribute and/or modify this map since it is based on free copyright. Three types of samples were collected throughout this study. Oral swabs (swabs, i.e., saliva) were collected from individual bats, feces samples were collected from individual bats defecating at the time of sampling, and guano samples were collected from below the colonies where available. Swabs and feces samples were preserved in 500 µL of nucleic acid stabilization reagent (DNA/RNA Shield; Zymo Research, Irvine, CA, USA), already at the time of sampling, whereas guano was only resuspended in phosphate-buffered saline (PBS; 10% wt/vol) immediately before further processing in the laboratory.
All procedures including capture, handling of bats, as well as sample collection, were carried out in accordance with the ethical guidelines and permit delivered by the Croatian State Institute for Nature Protection (consent number: UP/I-612-07/16-48/163). All bats were successfully released at the location of their capture after sample collection.

Sample Preparation and Viral Nucleic Acid Extraction
Oral swabs were pooled according to bat species, sampling date and location, with two exceptions: (i) since only one Myotis nattereri was caught it swab was pooled with swabs from Miniopterus schriebersii from same location; and (ii), although only one E. serotinus was caught, this animal was examined independently because it has been reported as a rabies reservoir previously [20]. In conclusion, there were a maximum of 15 individual swab samples per pool.
All samples were vortexed and centrifuged at 13,000× g for 5 min. Consequently, the supernatants were filtered (0.22-µm filters, Millipore, Burlington, NJ, USA) and the filtrates were subjected to nuclease treatment (100 U DNase I, New England Biolabs, Ipswich, MA, USA) at 37 • C for 1 h followed by automatic nucleic acid isolation using iPrepViral Kit and the iPrep instrument (Invitrogen, Carlsbad, CA, USA). Ribosomal RNA was depleted using 30 µL of RNA, 3 µL of Reaction Buffer A, 0.5 µL Riboguard RNase inhibitor and 1 µL Terminator 5 -Phosphate-Dependent Exonuclease (Epicentre Biotechnologies, Madison, WI, USA) as described previously [21]. After the subsequent round of purification using RNA Clean and Concentrator (Zymo Research), the cDNA Synthesis System Kit (Roche Diagnostics, Basel, Switzerland) was used for double-stranded cDNA synthesis, carried out according to the manufacturers' instructions.

Library Construction and Nextera XT Illumina Sequencing
Double-stranded DNA was quantified using Qubit fluorimeter (Life Technologies, Carlsbad, CA, USA) and diluted to a final concentration at 0.2 ng/µL. Sequencing libraries were prepared using Nextera XT sample preparation kit and the Nextera index kit (Illumina, San Diego, CA, USA) according to manufacturers' instructions, using 5 µL of diluted dsDNA. Finally, the libraries were sequenced using the 500-cycle MiSeq reagent kit v2 on the MiSeq platform (Illumina).

Viral Metagenomic Profiling
For the purpose of viral metagenomic profiling, the obtained read pairs were merged based on minimum overlap length of 40 nucleotides (nt) displaying a minimum sequence identity of 90%, using CAP3. Merged sequences, longer that 100 nt, were considered in further metagenomic analyses. The sequences were then compared to the GenBank non-redundant protein database (downloaded 21 June 2018 from National Center for Biotechnology Information (NCBI): [22]) using Diamond BLASTx (version GitHub commit: [23,24]), adopting an E-value cut-off of 10 −4 . The Diamond BLASTx search was restricted to the database protein sequences that corresponded to the subset of Viruses (taxid: 10239). The Diamond BLASTx output was used to assign taxonomic classifications to the merged sequences with MEGAN Community Edition (version 6.11.7, built 11 June 2018, [25]). Taxonomic classification output was further processed, summarized and visualized using custom procedural scripts in Bash and Python programming languages, using the functionality provided by numpy, scipy, matplotlib, scikit-learn and pandas python 2.7 modules. Host range information was obtained from the NCBI Taxonomy portal by parsing the html files related to the relevant taxon, using a custom procedural script. It is noteworthy that the lowest level taxon that could be obtained for a given read pair-and that also included a host range description on the NCBI Taxonomy pages-was used, and the information was subsequently summarized upwards until the taxonomical level of family. Data availability: BioProject: PRJNA433098.

Complete Viral Genome Assembly, Identification and Taxonomic Classification
De novo assemblies, aimed to reconstruct complete genome sequences of the viruses in the samples were obtained using SPAdes 3.12.0 [26]. The de novo assembled contigs, longer than 1000 nt, were also searched against the non-redundant protein database, as described above, contigs indicating protein hits were then used in NCBI blastn and NCBI blastx [27] searches to help guide taxonomical classification. Completeness of the de novo-assembled viral genomes was determined by comparing certain characteristics the query contigs and the preliminary taxonomical units assigned to them, including genome length, gene content, sequence similarity (complete and on the gene level), sequence features at the contig flanks, stem-loop signals, etc., as described at the relevant International Committee on Taxonomy of Viruses (ICTV) webpage (https://talk.ictvonline.org/taxonomy/).

Phylogenetic Analyses
Sequence alignments were prepared using Muscle v3.8.31 [28], maximum likelihood phylogenetic trees were constructed using IQ-TREE v1.6.10 [29]. Reliabilities of phylogenetic trees were evaluated using the SH-like approximate likelihood ratio test (SH-aLRT, [30]) with 1000 replicates, the abayes test and the ufbootstrap [31] procedure with 1000 replicates; best fitting phylogenetic models were selected automatically using IQ-TREE functionality [32]. The GeneBank accession numbers of the viral sequences used in phylogenetic analyses are shown on tree figures. Trees were visualized using FigTree (v1.7).

Statistical Analysis
The zeroth-and first-order Hill diversity numbers [33] were calculated as measures of viral diversity, based on the taxonomic and host range classification outputs (denoted: 0 D tax , 1 D tax , 0 D host , 1 D host ). While the zeroth order diversity number ( 0 D) addresses the number of different taxa identified in each sample, its first-order counterpart ( 1 D) introduces a measure of abundance of individual detected taxa into the metric.
Comparisons of numerical variables of multiple groups were carried out using the Kruskal-Wallis test [34] and statistical significance was determined at a threshold p-value level of 0.01. Correlation of between pairs of variable vectors was based on the Spearman rank correlation coefficients (ρ).

Bats, Locations and Sampling
Samples were collected from a total of 455 bats. Most sampled bats were identified as Miniopterus schreibersii (n = 255), followed by R. ferrumequinum (n = 90), Myotis myotis (n = 56), Myotis emarginatus (n = 10) and Myotis blythii (n = 27). Only single E. serotinus and Myotis nattereri each were caught and examined. One bat escaped before its species and sex could be determined and it was not included. Fourteen bats of the Myotis genus could not be classified to the species level confidently based on morphological criteria and were excluded from the study as well.

Viral Metagenomic Profiling
Details regarding the sample composition of each pool are provided in Table 1. A total of 7,536,096 reads were obtained through sequencing and read pair assembly yielded 1,565,543 merged sequences (termed "sequences") that were further used for viral metagenomic composition analysis. Of these, 39,527 sequences (2.52%) were attributed to viruses by viral metagenomic profiling (Table 1). Most sequences were obtained from swab samples (n = 1,107,621; 70.75% of total sequences), which represented 65.12% of samples. Only 0.47% (n = 5199) of these could be attributed to viruses. Fecal samples displayed both the highest content of viral sequences (50.47%) and contribution to all viral sequences obtained during this study (52.49%) ( Table 1). Of all viral sequences, 75.26% and 43.18% identified originated from Mediterranean locations and locations surrounded by natural landscape, respectively ( Table 1). The bat species R. ferrumequinum displayed the highest number of viral sequences (n = 19,060), followed by Miniopterus schreibersii (n = 1634), Myotis myotis (n = 955), E. serotinus (n = 452), Myotis blythii (n = 380) and Myotis emarginatus (n = 236). A large number of viral sequences (n = 16,810), second in rank, originated from samples that could not be attributed to a single species of bat (Table 1).

Eukaryotic Versus Prokaryotic Viruses
Most viral sequences identified were attributed to eukaryotic viruses (67.13%), while 27.16% of viral sequences represented prokaryotic (bacterial or archaeal) viruses. Invertebrate (83.18%) and vertebrate (12.73%) viruses represented the highest overall percentages of eukaryotic viral sequences, followed by protozoan (3.04%), plant or algal (2.27%), and fungal (0.02%) viruses. A total of 1.02% of viral sequences were attributed to eukaryotic viruses with a wide host range-attributed to more than one of the host range classes described above; 5.71% of viral sequences were attributed to viruses with unknown host ranges ( Table 2).

Viral Metagenomic Profiling Revealed Presence of 63 Viral Families; Most Common Were Viruses Infecting Eukaryotes-Insects
The combined dataset indicated traces of 63 families of viruses, belonging to all seven Baltimore groups, including viruses infecting bacteria or archaea, insects, fungi, mammals, and plants or algae, as well as viruses that can infect various eukaryotes and those whose host range remains undefined (Tables 1 and 2). The most common taxonomical families included the insect-infecting Iflaviridae (44.66%, n = 17,654; 81.80% of invertebrate infecting viral sequences), followed by Siphoviridae (9.59%, n = 3791) and Podoviridae (7.35%, n = 2906). At the same time, Siphoviridae and Podoviridae represented the majority of bacterial and archaeal virus sequences-35.31% and 27.07%, respectively. A total of 14.24% (n = 5628) of viral sequences could not be effectively attributed to any single taxonomical family ( Table 2).
Mimiviridae and Phycodnaviridae were represented by 84.81% and 59.01% of viral sequences among protozoan, and plant and algal viruses, respectively.
Most sequences, representing viruses with a wide eukaryotic host range were attributed to viruses not assigned to any taxonomical family (98.13%), or to the Peribunyviridae family (1.87%) (See Table 2 and Supplementary Table S1).
Interestingly, viral sequences related to Alloherpesviridae and Malacoherpesviridae, respectively, were found at both Mediterranean and continental locations, in oral swabs and guano. Table 2. Taxonomical (family level) compositions of the studied samples grouped according to various attributes. Table also denotes the Baltimore classifiers corresponding to each taxonomical family taxon and the host ranges, which were identified for the viral sequences contributing to the taxonomical family composite, based on the data in NCBI. The data are stratified according to sample type (S-oral swab, F-feces, G-guano), geographical location (M-Mediterranean, C-continental), landscape type (N-natural, H-with human activity/anthropogenic) and according to bat species (MS-M. schreibersii, MM-M. myotis, RF-R. ferrumequinum, ME-M. emarginatus, ES-E. serotinus, MB-M. blythii).

Families
Host

Vertebrate Viruses
Vertebrate virus sequences were mostly comprised of Retroviridae (27.89%), Parvoviridae (22.65%), Iridoviridae (13.14%) and Poxviridae (11.51%). Of note, Reo-, Herpes-, Flavi-, Circo-, Astro-, Coronaand Picornaviridae each contributed between 1 and 10% of the sequences attributed to vertebrate-infecting viruses, whereas other important vertebrate pathogen viral families such as Adeno-, Calici-, Polyoma-, Peribunya-, Asfaviridae each contributed less than 1% of sequences attributed to vertebrate-infecting viruses. It should also be noted that among the taxonomical families that are known to contain some of the most noteworthy viruses pathogenic to vertebrate hosts, such as Polyomaviridae and Peribunyaviridae were only detected in guano samples, Adenoviridae only in swab and feces samples, Coronaviridae only in swab and guano samples, whereas Asfaviridae were only found in swab samples. The presence of Asfaviridae was only suggested for a single swab sample pool from location 6. Coronaviridae sequences were found in four samples from four different locations (locations: 2, 3, 4 and 11) (Figure 1). We detected traces (five viral contigs) of Rhabdoviridae in a single sample: a pool of guano specimens collected the continental location 3. All samples mentioned above in relation to Asfaand Coronaviridae originated from locations relatively devoid of human activity. A complete summary of metagenomic assignments is available in Supplementary Table S1.

Taxonomical Families of Vertebrate Viruses Interesting Due to Zoonotic Potential
Among 922 retroviral sequence contigs from 36 different samples, there were several species-level taxonomical assignments indicating the presence of human retroviruses, such as human endogenous retroviruses H, K and W, human T-lymphotropic retrovirus, multiple sclerosis-associated retrovirus, human mammary tumor retrovirus, etc., the matching sequences exhibited 45-90% identities at alignment lengths of a mere 30-100 amino acids. There was also a single sequence that, in translation, contained a 30-amino-acid segment with identical to a part of the env protein of human immunodeficiency virus 1 (HIV-1).
The next most abundantly identified group of vertebrate viruses with zoonotic potential were Parvoviridae, where we managed to reconstruct two complete genomes of a novel adeno-associated virus and a novel densovirus, and the dataset suggested a further presence of human parvoviruses 4 and B19, with 1 and 2 sequences, respectively, all alignments indicated between 60% and 64% similarity to the database protein sequences and at alignment lengths between 34 and 36.
Among Reoviridae we found indices of Rotaviruses A, B, C, F, G and others, including several of their taxonomical subunits. Most interesting were the sequences indicating the presence of Human rotavirus A, highlighted by the presence of sequences representing several of its genes (VP2, VP3, VP6, VP8, NS3, NS5), at similarity values ranging from 45-100%. All reovirus sequences were found in continental locations 1, 2, 3 and 4, surrounded both, by natural and anthropogenic landscape, in swab, feces and guano samples.
Among Herpesviridae we found indications of presence of Human alphaherpesvrus 1 (Human herpesvirus 1-HHV-1/Herpes simplex virus-HSV; 44 sequences), Human betaherpesvirus 5 (human cytomegalovirus-HCMV; 3 sequences) and Human gammaherpesviruses 4 and 8 (Epstein-Barr virus-EBV and Kaposi's sarcoma-associated herpesvirus-KSHV; 3 and 2 sequences), in nine, three and one different samples, respectively. KSHV alignments were 65 and 46 amino acids long with 34% and 50% identity to the reference protein sequences; they did not indicate any specific protein function. EBV sequences indicated variable alignment similarities of 58%, 74% and 45% at alignment lengths of 69, 98 and 95 amino acids, and indicated relatedness to a ribonucleotide reductase, a viral Stanniocalcin analog and a protein without a known function, respectively. Sequences resembling HCMV aligned at relatively low similarities of 35%, 50% and 53% at alignment lengths of 128, 45 and 80, respectively. While the first of the three sequences aligned to a protein without a specific function, the latter two hinted relatedness to a viral chemokine family protein. All HHV-1-like sequences resembled a viral glycoside hydrolase enzyme at variable similarity rates, ranging from 35 to 100%. In all of the above cases, we would speculate that she sequences either originated from a similar herpesvirus or actually represent host contaminants. This is due to the fact that (i), in all cases, only a single protein (or proteins from a single focal point in the viral genome) was detected, as well as (ii) the mentioned low sequence similarities at the limited alignment lengths.
Adenoviridae-like sequences resembled mainly different bat, porcine and canine mastadenoviruses. A single sequence indicated similarity to the Pre-hexon linking protein IIIa of Human mastadenovirus F, although at similarity level of 75% and at the alignment length of 52 amino acids.
Coronaviridae-like sequences included representatives of both alpha and betacoronaviruses, but mainly limited to bats in terms of host ranges. A single sequence indicated peak protein sequence similarity to short stretch of the spike glycoprotein of severe acute respiratory coronavirus (SARS-CoV) found in bats in Yunnan province, China in 2016. Although the alignment exhibited 91% identity, the sequence stretch spanned a mere 32 amino acids, and could have actually originated from another, similar, but so far unknown, betacoronavirus.
We did not identify any potential human pathogens among Flaviviridae-like sequences, these resembled mainly Pestiviruses A and B, one sequence resembled a protein from Hepatitis virus GB type B. Among Peribunya-, Astroand Caliciviridae-like sequences, we found indication of only bat virus-like sequence, while Picornaviridae several sequences resembling insect-infecting viruses, with no viruses with potentially harmful to humans. All five Rhabdoviridae-like sequences resembled plant rhabdoviruses and none indicated presence of mammalian rhabdoviruses. Presence of African swine fever 1 might have been suggested by the detection of the single Asfaviridae sequence and the single Polyomaviridae sequence indicated presence of an arachnid polyomavirus.

Three Complete and One Nearly Complete Genome Sequences of Novel Viruses Were Identified
After de novo assembly of read pairs we identified and characterized three complete novel genome sequences, including an adeno-associated virus (Adeno-associated virus Croatia cul1_12; GenBank Acc. No.: MN099037), a densovirus (Ambidensovirus Croatia 17_S17; GenBank Acc. No.: MN099038) and of a novel circo-like virus (Circo-like virus Croatia 17_S17; GenBank Acc. No.: MK241555). Furthermore, two partial, nearly complete, genome sequence contigs (Iflavirus sp. strain 15/G-Me polyprotein gene and Iflavirus sp. strain 16/F-Rf polyprotein gene; GenBank Acc. Nos.: MG963177, MG963178), that may originate from divergent viruses of the Picornaviriales order, likely from the Iflaviridae family, were identified. The complete genome sequence Circo-like virus Croatia 17_S17 has been described previously in a separate publication [35]; most notably, it was found to display peak similarity scores to viruses found in human-derived samples.

Iflavirus
Two sequence contigs, likely originating from viruses of the family Iflaviridae (order Picornavirales), 4201 and 1864 nt in length, respectively, were found in guano collected from under mixed colonies of Myotis emerginatus and R. ferrumequinum and from the individual feces of R. ferrumequinum at location 7. Sequences were generated from 1158 sequencing reads with mean coverage = 93 (GenBank accession No.: MG963177, MG963178). The two-nucleotide sequence contigs assembled de novo displayed 74% and 78% NCBI blastn identities, to Spodoptera exigua iflavirus 2 isolates from Spain (GenBank Acc. No.: KJ186788) and Korea (GenBank Acc. No.: JN870848), respectively.

Ambidensovirus
A complete genome sequence of a novel virus, named Ambidensovirus Croatia 17_S17, exhibiting peak NCBI blastn identities to denso and denso-like virus sequences in GenBank, was identified in The identified ORFs putatively encode three parvoviridal non-structural proteins (NS1, NS2 and NS3) in the 5 terminal region of the positive DNA strand, and two structural proteins (VP1 and VP4) in the 5 terminal region of the negative DNA strand. The complete nucleotide genome sequence was covered on average 71.48× by 3126 sequencing reads and indicated peak nucleotide sequence identity values of 70.6% and 64.2% to the NCBI RefSeq complete genome sequences NC_031450 (Parus major densovirus isolate PmDNVJL, complete genome) and NC_005041 (Blattella germanica densovirus 1, complete genome), respectively, as determined by sequence demarcation toolkit (SDT; v1.2; [36]). The NS1 protein sequence of the novel virus displayed a peak amino acid sequence identity of 60.15% to sequence NP_874381, the Blatella germanica densovirus 1 NS1 protein. Phylogenetic clusterings, embedding all available Densovirinae NS1, Rep1, Rep68 and Rep78 proteins (obtained from the non-redundant protein dataset), and complete genome sequences (obtained from RefSeq) in NCBI, indicated monophyletic relationships of the novel virus with the Ambidensovirus clade (Figure 2A).

Adeno-Associated Virus (AAV)
A complete genome sequence of a novel adeno-associated virus, named Adeno-associated virus (AAV) Croatia cul1_12, NCBI blastn identities to adeno-associated virus sequences in GenBank, was found in oral swabs collected from R. ferrumequinum at location 9. The complete genome sequence of the novel AAV was 4561 nt long and indicated the presence of two ORFs, encoding a putative replicase (Rep; in the 5′-terminal region) and a putative capsid protein (Cap, in the 3′-terminal region); both ORFs were found on the same DNA strand. The complete genome sequence was flanked by 38 nt long All phylogenetic trees were inferred using IQ-TREE, best-fitting phylogenetic models were selected automatically based on the Bayesian information criterion; all trees were visualized using FigTree and rooted at mid-point. The complete genome phylogenetic trees were constructed based on the complete genome sequences matching Densovirinae (A.1) and Parvovirinae (B.1) taxons in NCBI RefSeq using phylogenetic models GTR+F+R4 and GTR+F+R6, respectively. The NS1/Rep protein phylogenetic trees were constructed based on NS1 and Rep proteins sequences from the NCBI Non-redundant protein dataset matching Densovirinae (A.2) and Parvovirinae (B.2) taxons, using phylogenetic models VT+F+R4 and VT+F+R9.

Adeno-Associated Virus (AAV)
A complete genome sequence of a novel adeno-associated virus, named Adeno-associated virus (AAV) Croatia cul1_12, NCBI blastn identities to adeno-associated virus sequences in GenBank, was found in oral swabs collected from R. ferrumequinum at location 9. The complete genome sequence of the novel AAV was 4561 nt long and indicated the presence of two ORFs, encoding a putative replicase (Rep; in the 5 -terminal region) and a putative capsid protein (Cap, in the 3 -terminal region); both ORFs were found on the same DNA strand. The complete genome sequence was flanked by 38 nt long inverted terminal repeat (ITR) regions on the 5 and 3 termini, the complete sequence was covered in average 91.27×, by 4117 sequencing reads. Phylogenetic clustering and nucleotide and amino acid identity values were estimated by embedding the complete genome sequences (RefSeq) and NS and Rep sequences (Non-redundant protein dataset) in NCBI, corresponding to the taxonomical sub-family Parvovirinae, and indicated a peak nucleotide sequence identity of 68.8% to the RefSeq sequence NC_014468 (Bat Adeno-associated virus YNM, complete genome) and a 70.5% peak amino acid identity value in the replicase protein to sequence YP_680424.1, representing the Rep40 protein of Adeno-associated Virus 2. Both the complete genome and the replicase protein of the novel virus clustered monophyletically alongside dependoparvoviruses ( Figure 2B).

Quantitative Analysis of Diversity and the Viral Compositions of Sample Groups
Viral diversities in samples were estimated by calculating the zeroth-(number/count of different categories in sample/group) and first-order (normalized count expressing deviation from uniformity of representation) Hill diversity numbers, both in the context of individual samples as well as based on data, grouped according to different attributes: geographical location, landscape type, sample type and bat species (Tables 1 and 3, Figure 3). In feces, guano and swabs we identified 30, 56 and 39 different viral families, respectively (Table 1). Grouped according to bat species, we found most viral families in samples obtained from R. ferrumequinum, followed by Miniopterus schreibersii, Myotis myotis, E. serotinus, Myotis emarginatus and Myotis blythii (Table 1). We found more different taxonomical families in samples originating from locations of natural landscape, compared to those affected by human activity (Table 1). Similarly, continental locations exhibited higher numbers of different viral families than Mediterranean ones (Table 1).
It should be noted that Table 1 lists the diversity indices respective of the various groupings, as calculated directly from the sets of different categories identified (and their relative abundances) in the selected segment of the data, while Figure 3 intends to visualize the spreads of the diversity indices calculated from individual samples (Table 3) belonging to the relevant groups. In contrast to the somewhat drastic differences between the taxonomical 0 D indices in Table 1, in the case of grouping according to sample type no statistical significance could be identified (p < 0.05, [34]) in the sample-based groupwise comparison of taxonomical 0 D indices (Figure 3). On the other hand, the taxonomical 1 D and the host range 0 D indices also indicated statistically significant discrepancies (p < 0.05 [34]). In further detail, in the context of 1 D tax , both feces and guano samples appeared to differ from swab samples, whereas in the context of 0 D host range swabs and feces differed from guano samples (p < 0.05; Mann-Whitney U test). Samples from continental Croatia displayed significantly higher numbers of both unique taxonomical families as well as host-range categories ( 0 D, Figure 3). On the other hand, Mediterranean samples displayed significantly higher 1 D diversities, suggesting higher evenness of the represented categories (Figure 3). This seems reasonable since only 9780 viral sequences were obtained from continental but 29,747 from Mediterranean locations and could suggest an insufficient sampling depth of continental locations. Importantly, significant differences could also be observed between the medians of sample 0 D tax , 1 D tax , 0 D host indices following grouping according to sample type (Figure 3). Table 3. Metadata summary of the sequenced pools. Table lists     . Taxonomy and host-range-based Hill diversity indices (D) calculated by grouping samples according to sample type (oral swab, feces, guano), geographical locations from where the samples were obtained (Mediterranean, continental) and whether the surrounding landscape is subject to human activity (Natural, Human act.). * denotes there may be statistically significant differences (p < 0.01) between the population medians of the strata, based on the Kruskal-Wallis test. The diversity indices are depicted a violin plots, widths of the "violins" correspond to the kernel density estimates at the given ordinate, and as box and whisker plots, indicating the data mean (orange line), the box stretches between the first (Q1) and the third quartile (Q3), and the whiskers enveloping the most extreme values lower than Q1 or higher than Q3 by more than 1.5 * (Q3-Q1); values beyond the whisker definition are depicted as individual circles/points. Taxonomy and host-range-based Hill diversity indices (D) calculated by grouping samples according to sample type (oral swab, feces, guano), geographical locations from where the samples were obtained (Mediterranean, continental) and whether the surrounding landscape is subject to human activity (Natural, Human act.). * denotes there may be statistically significant differences (p < 0.01) between the population medians of the strata, based on the Kruskal-Wallis test. The diversity indices are depicted a violin plots, widths of the "violins" correspond to the kernel density estimates at the given ordinate, and as box and whisker plots, indicating the data mean (orange line), the box stretches between the first (Q1) and the third quartile (Q3), and the whiskers enveloping the most extreme values lower than Q1 or higher than Q3 by more than 1.5 * (Q3-Q1); values beyond the whisker definition are depicted as individual circles/points.
Analysis of Spearman rank correlation coefficients (ρ) indicated weak positive correlations might be present between the number of viral sequences and the values of the 0 D indices in the dataset. The ρ( 0 D tax ) and ρ( 0 D host ) amounted to 0.585 and 0.558, with empirical probabilities for uncorrelated systems of 3.84 × 10 −5 and 1.02 × 10 −4 , respectively, although these probability values may not be reliable at the tested sample size (N samples = 43). A complete/representative sampling of given a population could be considered as one, where the addition of new data regarding that same population does not change the end result (information) significantly-in other words, a steady state, where all variation can be attributed to random dispersion, that is, a system where no correlation would be observed for the abovedescribed statistical setup.
Most obviously, the present data suggested that the viral composition in a given sample group could to a large degree be explained by the proportion of viral sequences derived from one of the three sample types, namely swab, feces or guano (Figure 4). Feces samples contained the highest proportion of invertebrate viruses; guano samples the highest proportion of bacterial and archaeal viruses, whereas vertebrate infecting viruses were most commonly represented in swab samples. Figure 3 clearly demonstrates that proportions of viral sequences from a given sample type play a crucial role in the host-range-based composition.
Viruses 2020, 12, x; doi: FOR PEER REVIEW www.mdpi.com/journal/viruses Analysis of Spearman rank correlation coefficients (ρ) indicated weak positive correlations might be present between the number of viral sequences and the values of the 0 D indices in the dataset. The ρ( 0 Dtax) and ρ( 0 Dhost) amounted to 0.585 and 0.558, with empirical probabilities for uncorrelated systems of 3.84 × 10 −5 and 1.02 × 10 −4 , respectively, although these probability values may not be reliable at the tested sample size (Nsamples = 43). A complete/representative sampling of given a population could be considered as one, where the addition of new data regarding that same population does not change the end result (information) significantly-in other words, a steady state, where all variation can be attributed to random dispersion, that is, a system where no correlation would be observed for the abovedescribed statistical setup.
Most obviously, the present data suggested that the viral composition in a given sample group could to a large degree be explained by the proportion of viral sequences derived from one of the three sample types, namely swab, feces or guano (Figure 4). Feces samples contained the highest proportion of invertebrate viruses; guano samples the highest proportion of bacterial and archaeal viruses, whereas vertebrate infecting viruses were most commonly represented in swab samples. Figure 3 clearly demonstrates that proportions of viral sequences from a given sample type play a crucial role in the host-range-based composition.

Discussion
Bats represent an important reservoir of emerging and re-emerging viral diseases for humans and animals [37]. Using metagenomic analysis based on NGS, the frequency of new bat virus discovery has increased drastically in recent years [38]. Herein, the first insights into the virome of the Croatian bat population have been revealed. We examined the viral metagenomes in oral swabs, feces and guano of Croatian bats.

Discussion
Bats represent an important reservoir of emerging and re-emerging viral diseases for humans and animals [37]. Using metagenomic analysis based on NGS, the frequency of new bat virus discovery has increased drastically in recent years [38]. Herein, the first insights into the virome of the Croatian bat population have been revealed. We examined the viral metagenomes in oral swabs, feces and guano of Croatian bats.
In the present study, 2.52% (39,527/1,565,543) ( Table 1) of sequences exhibited similarity to known viruses. Slightly lower values have been reported previously in the context of similar studies, such as 1.62% in Myanmar [38] and 0.92% overall in China [39].
We identified a total of 63 viral families (Table 1), while a total of 24 [38] and 51 [40] taxonomical families of viruses have been reported previously in Myanmar and French Guiana, respectively. We considered species level taxonomy assignments as unreliable, and they were rather summarized to the level of taxonomical families, as the minimum nucleotide sequence length of 100 nt (and 33, translated to amino acid code) does not offer sufficient species level specificity and the metagenomic outputs should be interpreted with extreme caution. We consider identification of a given viral taxonomical family in the metagenomic output as indication of presence, whereas only assembly of complete genome or partially assembled genome sequences can serve as firm evidence for the presence of a given virus.
Eukaryotic viruses predominated with 67.13% of viral sequences while prokaryotic (bacterial and archaeal) viruses were represented by 27.16% of viral sequences which is similar to the findings of He et al. (73.5% eukaryotic viruses and 26.5% prokaryotic viruses) [38]. Among prokaryotic viruses, Siphoviridae and Podoviridae were the most commonly identified viral families (Table 2), which is in line with results of previous studies: Siphoviridae and Podoviridae were the two most commonly identified prokaryote-infecting viral families in He et al. [38] and Salmier et al. [40].
The largest proportion of viral sequences related to eukaryotic viruses (83.18%) was attributed to viruses infecting invertebrates. With the exception of the Egyptian Fruit bat Rousettus aegyptiacus which feeds on fruit, all known European species of bats are insectivorous [1,2], and it is likely that the high relative abundance of insect-infecting viral sequences reflects the bats' diet, highlighting the bat's bio-insecticidal role in the environment. It should also be noted that the highest relative abundance of invertebrate-infecting viral sequences was found in feces, reinforcing the idea of their "alimentary" origin. The notion of the bats' bio-insecticidal role in Croatia was further reinforced by the identification of two partial genome sequences, assembled de novo during the present study from sequencing reads, of a currently unknown virus likely belonging to the Iflaviridae family, a family of insect viruses. The two sequence contigs were found in guano and feces collected at a Mediterranean location 7 and most closely resembled sequences of Spodoptera exigua iflavirus 2 from Spain and Korea. Spodoptera exigua is a species of moth, a common and well-known agricultural pest infesting different vegetables commonly grown in Croatia [41]. Iflaviridae were also the most commonly identified viral family in this study, as well as in samples from Mediterranean Croatia.
We also identified a complete genome sequence of a novel ambidensovirus, of the Parvoviridae family. The novel ambidensovirus indicated peak similarities to an ambidensovirus identified from the bird great tit (Parus major) [42] and the German cockroach (Blattella germanica) [43]; however, it should be classified as a novel species in the Ambidensovirus genus, according to current ICTV taxonomic classification criteria [44]. The closely related densovirius, detected by Yang and colleagues [42], was originally found in the lung tissue of a great tit, however, they were unable to determine whether the bird was infected by the densovirus or if the densovirus come from insects ingested by the bird without infection of the avian cells [42]. The fact that the ambidensovirus was detected in guano and was also highly similar to a densovirus identified in the German cockroach [43] indicates an alimentary origin of the virus.
An AAV, another member of the Parvoviridae family, subfamily Parvovirinae, genus Dependoparvovirus, was also identified. Adeno-associated viruses have up to now been recorded in bats from China, the USA and Myanmar [45], and, according to our best knowledge, this is the first evidence of an AAV in European bats. Additionally, this is first record of AAV in bat oral swab while in other research it was found in feces [45]. Although AAV usually requires co-infection with helper adenovirus or herpesvirus [46] none of these viruses could be found in the respective swab sample. According to current ICTV taxonomic classification criteria [47], the novel virus should be classified as a novel species in the Dependoparvovirus genus.
A novel circo-like virus, was identified in this study, circo-like virus Croatia 17_S17 (Circo-like virus Croatia 17_S17, GenBank Acc. No.: MK241555) has been described by our research group previously in the context of a genome announcement [35]. It indicated a similarity to Circo-like viruses Brazil HS1 and HS2, which were identified in human feces [48,49].
Sequences related to vertebrate viruses made up 12.73% of the bat virome, similar to reports from North America (<10%) [50,51], but contrasts the 45.2% vertebrate infecting viruses in Myanmar [38]. The observed difference could be attributed to the sample types used: studies where a lower percentage of sequences were related to vertebrate viruses, including our predominantly used feces, oral and/or fecal swabs as samples, while He et al. [38] used actual bat organ tissues. Furthermore, our results suggested a strong influence of sample type over the identified virome, as differences in both diversity and compositions of samples originating from different tissues. Feces contained the highest proportion of invertebrate viruses, guano bacterial and archaeal viruses, whereas vertebrate infecting viruses were most commonly represented in swabs. The highest relative abundance of invertebrate viruses in feces likely reflects the diet of bats while the predominance of prokaryotic viruses in guano is probably consequence of its bacterial colonization and decomposition. Considering that some of the viruses may not be excreted through fecal and oral routes, and some of them may have intermittent excretion, feces and oral swabs might not reveal the complete viromes [38]. However, due to the protected status of European bats, the chosen samples were only available.
Although swabs were the most numerous sample types (65.12% of samples, 70.75% of reads) in this study, they contributed only 13.15% viral sequences. On the other hand, despite the highest content of viral sequences and overall contribution (50.47%) to all viral sequences, feces exhibited the lowest viral diversity ( 0 D tax = 30), and the least uniformly distributed representation of the identified viral families (the highest 1 D tax / 0 D tax = 0.015). In contrast, guano had the highest viral diversity (56 viral families), both guano and swab samples indicated comparable 1 D tax / 0 D tax values (0.0021-swab; 0.0018-guano), indicated similar evenness of the relative abundances of the identified taxonomical families.
One of the most interesting findings in this study was the difference in viromes between two contrasting habitats, the continental and Mediterranean regions of Croatia. Most viral sequences came from Mediterranean habitats, but it was habitats from continental Croatia that contributed the most sequences in total (Table 1). On the other hand, even though the majority of viral sequences were detected at Mediterranean locations, the higher viral diversity was actually found in continental Croatia. This finding was primarily due to the presence of diverse plant viral families such as Betaflexviridae, Bromoviridae, Luteoviridae, Solemoviridae, Tombusviridae, Tymoviridae, and Virgaviridae. These plant viral families probably reflect the plant-based diet of the insects included in the bats alimentary chain. Additionally, when this finding is put in a geographical context, the majority of continental sampling sites were located at the Pannonian basin, one of the major European agricultural areas [52], this would further solidify the aspect of the bio-insecticidal role of bats, as well as their role as a bioindicator in the Croatian ecosystem. Viewed through the lens of evenness according to the 1 D tax / 0 D tax ratio, continental Croatia exhibited a more uniform distribution of the identified viral families in comparison to Mediterranean Croatia. It is generally assumed that a disturbance into an ecological system, such as the heavy agricultural exploitation of the Pannonian basin, would diminish the diversity and evenness of its occupants [53,54], thereby reducing the system's ability to adapt to change, leading to relative extinction of the majority and the proliferation of some species. In the present case, we would have expected a higher taxonomical evenness-a lower 1 D tax / 0 D tax ratio-in continental compared to Mediterranean Croatia. On the other hand, cross-referencing the continental/Mediterranean sample grouping with grouping based on landscape type revealed that while approximately one-third of Mediterranean locations were considered as anthropogenic (7/23) while only one-sixth of such samples came from continental Croatia. It was also of interest that the mentioned grouping, according to human activity in the surrounding landscape, suggested both a higher species richness in natural locations ( 0 D tax : 61 vs. 45) as well as higher evenness ( 1 D tax / 0 D tax ratio: 0.0013 vs. 0.0095).
Regarding bat species, the highest number of viral sequences and highest viral diversity was found in R. ferrumequinum and Miniopterus schreibersii, which is likely also a consequence of unbalanced sampling, as these bat species were the most represented in the samplings.
Among the raw metagenomic output, we did detect clues suggesting the presence of potential human pathogens, specifically among sequences resembling Retro-, Poxand Herpesvidiae. It is interesting that, in most cases, poxvirus-like sequences indicated similarity to different viral enzymes or receptors, such as the Molluscum contagiosum virus glutathione peroxidase, Orf virus analog of a type 2 taste receptor, a vaccinia virus analog of a glycoside hydrolase enzyme, the ectromelial complement control protein C3/B5, etc., all viral proteins involved in the interplay with the host organisms and immune systems, which per se also indicate high similarities to their respective mammalian analogues. It may be that all these viral proteins, and by consequence the virus species in question were present in the samples, but it is also possible that the sequences came from the protein's mammalian analogues. Retroviridae on the other hand have coexisted with some of their mammalian hosts for very long evolutionary periods and may exist in their hosts genomes in the form of endogenous retroviruses [55][56][57][58], and in some cases the delineation between host and viral DNA may have faded through away through time. Although it is possible that all these retroviruses were present in the bat samples these species-level taxonomy calls we considered as less reliable as they may also indicate presence of viruses similar to those detected (i.e., from the same governing taxonomical unit, such as family), but in the specific cases, mentioned above, may also have been misinterpreted and actually originated from the hosts DNA. From the same aspect of curation, these results may suggest, that the relative abundances of Retro-, Poxand Herpesviridae may have been overestimated in this study. On the other hand, the fact that we identified a large portion of reoviral genes in our samples, we do not doubt their presence, it should however be noted (again) that, due to low similarity values it is more likely that the sequences originated from different unknown rotaviruses, and not specifically human rotavirus A.
On a note of caution, the observed correlation between the number of viral sequences and the diversity indices could be interpreted to suggest that more viral families would be identified if the sequencings were augmented in depth. Also, as Li and colleagues [50] stated viral metagenomics can provide relevant sequence data on the most prevalent viruses present in samples but underestimate the diversity of low-concentration viruses. Sequencing depth is thus one of the most important factors to consider in the design of future viral metagenomics studies.
In this study, approximately 14% of viral sequences could not be assigned to any known viral family, which is a lower rate in comparison to similar research [50,54] and could be due to growth of publicly available reference genome databases during last year's. In the future it could be expected further decrease in this portion [59].
Finally, great care should be taken when interpreting the results of metagenomic studies, especially when it comes to an understudied field such as viromics. Although metagenomics can be a great investigative tool to nonspecifically uncover the viromes and metagenomes of various samples, it is in the same manner prone to the error. As there is no reference genomes for all viruses it is impossible to exclude that some sequences were classified inaccurately, i.e., errors may simply arise from unspecific sequence mappings. This was supported by finding parts of genomes of some viral families which are not typical for bats such as Alloherpesviridae and Malacoherpesviridae, known to normally infect fishes and molluscs.

Conclusions
The viral metagenomics data presented in this study provide a preliminary view of virome in Croatian bat population. The primary aim of this study was to facilitate the assessment of the Croatian bat population as a potential reservoir of viral pathogens with zoonotic potential. The further characterization of the bat virome will increase our understanding of mammalian virus diversity and timely detection of potential human pathogens. This study contributes significantly to better understanding of global diversity of bat viruses. Most importantly, in order to compare viral populations in bats from different parts of the world or of different species, effectively-factors such as sample type, habitat and sequencing depth are paramount and should be considered and evaluated with utmost rigor. Also, presence of adequate reference databases play crucial roles in the definitions of viral sample compositions, and further interpretations related to downstream analysis.