Incipient Sympatric Speciation and Evolution of Soil Bacteria Revealed by Metagenomic and Structured Non-Coding RNAs Analysis

Simple Summary The microevolutionary dynamics of soil bacteria under microclimatic differences are largely unexplored in contrast to our improving knowledge of their vast diversity. In this study, we performed a comparative metagenomic analysis of two sharply divergent rocks and soil types at the Evolution Plateau (EP) in eastern Upper Galilee, Israel. We have identified the significant differences in bacterial taxonomic diversity, functions, and patterns of RNA-based gene regulation between the bacteria from two different soil types. Furthermore, we have identified several species with a significant genetic divergence of the same species between the two soil types, highlighting the soil bacteria’s incipient sympatric speciation. Abstract Soil bacteria respond rapidly to changes in new environmental conditions. For adaptation to the new environment, they could mutate their genome, which impacts the alternation of the functional and regulatory landscape. Sometimes, these genetic and ecological changes may drive the bacterial evolution and sympatric speciation. Although sympatric speciation has been controversial since Darwin suggested it in 1859, there are several strong theoretical or empirical evidences to support it. Sympatric speciation associated with soil bacteria remains largely unexplored. Here, we provide potential evidence of sympatric speciation of soil bacteria by comparison of metagenomics from two sharply contrasting abutting divergence rock and soil types (Senonian chalk and its rendzina soil, and abutting Pleistocene basalt rock and basalt soil). We identified several bacterial species with significant genetic differences in the same species between the two soil types and ecologies. We show that the bacterial community composition has significantly diverged between the two soils; correspondingly, their functions were differentiated in order to adapt to the local ecological stresses. The ecologies, such as water availability and pH value, shaped the adaptation and speciation of soil bacteria revealed by the clear-cut genetic divergence. Furthermore, by a novel analysis scheme of riboswitches, we highlight significant differences in structured non-coding RNAs between the soil bacteria from two divergence soil types, which could be an important driver for functional adaptation. Our study provides new insight into the evolutionary divergence and incipient sympatric speciation of soil bacteria under microclimatic ecological differences.


Introduction
Diverse microbiome populations play a critical role in the environments ranging from soil to human and animal guts [1,2]. The soil microbiome plays a functional role in carbon cycling, decomposition of organic matter, nutrient transformation, maintenance of ecosystem sustainability [3], and supporting plant growth while providing vital ecosystems [2,4]. Vice versa, the microbiome could be affected by their depending environments [5]. Correspondingly, different strains of the same species can be significantly distinct in their gene content and single-nucleotide polymorphisms (SNPs) [6,7]. These strain-level differences have shed light on understanding adaptation, evolution, and sympatric speciation of the microbiomes. For example, differentiated ecologies drove the population subdivision of ocean bacteria [8] and also the presence of subpopulations in ancient marine bacteria [9]. Emerging shreds of evidence support that changes in ecology and environmental patterns drive the evolution of microbial species [10][11][12][13]. A recent study demonstrated the rapid strain level evolution of soil bacteria in response to climate changes [14]. To quickly adapt to the new environment, bacteria rapidly mutate their gene sequences [15,16] and non-coding RNAs [17], which can generate new functionality and regulatory mechanisms. This mechanism could promote the continuous emergence of new strains [18,19]. The continuous genetic changes and fitness stress associated with the new ecology may generate new bacterial species.
However, the origin of species has been controversial since first suggested by Darwin [20]. The allopatric speciation common model hypothesized that populations evolve into new species when geographically isolated, unaffected by the homogenizing process of gene flow, and thus, they can accumulate sufficient genetic divergence and reproductive isolation to evolve into new species. By contrast, the sympatric speciation (SS) model hypothesized that a new species might originate as a local variety [20], and in general, one species splits into two sister species without any physical barriers to block gene flow between the two populations. In modern terminology, a new species can evolve within a smaller space characterized by free breeding meta-populations with gene flow, provided there are divergent ecologies in the speciation theater [21]. The SS model was and is controversial since suggested and still considered rare by most biologists, though mounting evidence in eukaryotes, both theoretical and empirical [22][23][24][25], does indicate that new species indeed can originate despite ongoing gene flow (Table S1). The SS model becomes even more enigmatic in bacteria whose systematics is still problematic. We proposed a new paradigm shift to incorporate ecology and divergent genetics in bacterial systematics [26] based on our two microsite evolution models, the microclimatic microsite model [27][28][29][30], and the geological-edaphic microsite model [24,25,31]. In the first model designated "Evolution Canyon" (EC), a tropical, hot and dry, savannoid biome abuts with temperate, cool and humid, forested biome across a few hundred meters. In the second model, two different rocks and soil types abut at "Evolution Plateau" (EP) in eastern Upper Galilee: Senonian chalk abuts with Pleistocene volcanic basalt. Both models, the microclimatic and edaphic, demonstrated hotspots of SS, primarily unfolded in eukaryotes from bacteria to mammals, with only one case of prokaryotes, in the soil bacterium Bacillus simplex [32]. Here, we asked whether SS is relevant in prokaryotes, as we and others (Table S1) highlighted in eukaryotes. Our criterion of identifying sympatric speciation is based on our new ecological paradigm linked with little to medium genetic change between the progenitor and derivative species speciating sympatrically in the new ecology. Shotgun metagenomic sequencing and biocomputing are widely applied to study the genetic diversity of the uncultured microbial communities in diverse environments [1]. Recently, strain-level analysis of metagenomes has shown that shotgun metagenomic sequencing has the potential to understand strain-level heterogeneity within bacterial genomes and between microbiome communities in contrast to only the sequencing of 16S ribosomal RNA [33]. All these current approaches to detect strain-level variations are dependent on the availability of a collection of microbial reference genomes to identify SNPs and quantify gene content in microbial species [33,34].
To study the bacterial evolution, including the SS within the microbiomes of different soils, we compared the metagenomes from two sharply divergent rocks and soil types (chalk weathering into rendzina soil and volcanic basalt into basalt soil) at the EP, in the eastern Upper Galilee Israel ( Figure 1A). The basalt is clayey, wetter, and muddy with low permeability ( Figure 1B); in contrast, the chalk is drier and barren with better ventilation ( Figure 1C). Ca 2+ is the most abundant element in the chalk soil ( Figure 1D), followed by Si 4+ , Al 3+ , Fe 3+ , and Mg 2+ , and Si 4+ is the most abundant chemical in the abutting contrasting basalt soil, followed by Al 3+ , Fe 3+ , and Ca 2+ . Al 3+ , Fe 3+ , and Ca 2+ contents are significantly different between the two soils [35]. Thus, basalt is generally acidic, while chalk is slightly alkaline [36]. Such contrasting ecologies without any physical barriers lead to population divergences with gene flow, including bacteria [32], wild barley [37], and blind mole rat [24,25,31,[38][39][40][41]. This present study demonstrated that the bacterial community composition has significantly diverged between the chalk and basalt soil types. Furthermore, we observed that the bacteria from two different soil types have significant differences in functions and preferences of structured non-coding RNAs, highlighting the evolution of functional diversity to adapt locally to new ecological stresses. Moreover, remarkably, we identified the incipient sympatric speciation in soil bacteria as we identified in eukaryotes across life in our evolution microsites, based on divergent ecologies, with ongoing gene flow, suggesting that natural selection can overrule the homogenizing effect of gene flow in both eukaryotes and prokaryotes [42].
Biology 2022, 11, x FOR PEER REVIEW 3 of 17 level analysis of metagenomes has shown that shotgun metagenomic sequencing has the potential to understand strain-level heterogeneity within bacterial genomes and between microbiome communities in contrast to only the sequencing of 16S ribosomal RNA [33]. All these current approaches to detect strain-level variations are dependent on the availability of a collection of microbial reference genomes to identify SNPs and quantify gene content in microbial species [33,34].
To study the bacterial evolution, including the SS within the microbiomes of different soils, we compared the metagenomes from two sharply divergent rocks and soil types (chalk weathering into rendzina soil and volcanic basalt into basalt soil) at the EP, in the eastern Upper Galilee Israel ( Figure 1A). The basalt is clayey, wetter, and muddy with low permeability ( Figure 1B); in contrast, the chalk is drier and barren with better ventilation ( Figure 1C). Ca 2+ is the most abundant element in the chalk soil ( Figure 1D), followed by Si 4+ , Al 3+ , Fe 3+ , and Mg 2+ , and Si 4+ is the most abundant chemical in the abutting contrasting basalt soil, followed by Al 3+ , Fe 3+ , and Ca 2+ . Al 3+ , Fe 3+ , and Ca 2+ contents are significantly different between the two soils [35]. Thus, basalt is generally acidic, while chalk is slightly alkaline [36]. Such contrasting ecologies without any physical barriers lead to population divergences with gene flow, including bacteria [32], wild barley [37], and blind mole rat [24,25,31,[38][39][40][41]. This present study demonstrated that the bacterial community composition has significantly diverged between the chalk and basalt soil types. Furthermore, we observed that the bacteria from two different soil types have significant differences in functions and preferences of structured non-coding RNAs, highlighting the evolution of functional diversity to adapt locally to new ecological stresses. Moreover, remarkably, we identified the incipient sympatric speciation in soil bacteria as we identified in eukaryotes across life in our evolution microsites, based on divergent ecologies, with ongoing gene flow, suggesting that natural selection can overrule the homogenizing effect of gene flow in both eukaryotes and prokaryotes [42].

Sampling, DNA Isolation, and Sequencing
A total of 12 soil samples were collected for chalk (6 samples) and abutting derivative basalt (6 samples) from "Evolution Plateau" (EP) in the eastern Upper Galilee Israel in January 2016 and emerged into liquid nitrogen immediately in the field. The samples were from one line and one sample every 50 meter. All the samples were transported to a company for DNA isolation and sequencing. DNA was isolated from each soil sample by using QIAmp DNA stool mini kit (Qiagen, Hilden, Germany). Qualified DNAs were sonicated to about 350 bp fragments; after purification, end-repair, A-tailing, and adaptor ligation were carried out and followed by PCR amplification. After quality control, libraries of each sample were sequenced with 150 bp paired-end reads. A total of~50 million 150 bp paired-end raw reads were generated from the shotgun metagenomic sequencing of each sample.

Taxonomic Compositions of Basalt and Chalk Sample
Kraken2 [50] was used for taxonomic profiling, and the relative abundance was estimated by Bracken [51]. Next, taxonomic and functional data tables were downloaded and fed into the STAMP v2.1.3 (Statistical analysis of metagenomic profile) software [52] for statistical validation. The species with significant differences were identified by LEfSe (linear discriminant analysis effect size, LDA score > 2) [53] on the website (http://huttenhower.sph.harvard.edu/galaxy/, v1.0, accessed on 19 May 2022).

Function Compositions of Basalt and Chalk Sample
High-quality unassembled reads were first uploaded onto the MG-RAST server [54] for functional profiling, and the subsystem database was selected with default parameters. In order to obtain a more accurate contig for the construction of non-redundant gene catalogs, we use MEGAHIT [55] to assemble the reads; next, we use Prodigal [56] to predict coding sequences (CDS, >100 bp) of assembled contig and use CD-HIT [57] (-c 0.95 -d 0 -aL 0.9 -uL 0.05 -aS 0.9) to obtain a non-redundant gene catalog. KoafmKOALA [58] was performed to annotate functional genes against the Kofam database. KEGG enrichment was accomplished by ReporterScore [59]. Carbohydrate active enzymes (CAZyomes) were identified by dbCAN2 [60]. The relative gene abundance was calculated as follows: Step 1: Calculation of the copy number of each gene: b i = x i /L i ; Step 2: Calculation of the relative abundance of gene i: a i = b i /∑b i : a i : the relative abundance of gene I; b i : the copy number of gene i from sample N; L i : the length of gene i; x i : the number of mapped reads.
The calculations mentioned above were performed with our custom Python scripts.

Analysis of Structured Noncoding RNAs in the Metagenome-Assembled Genomic Reads (MAGs) Sequences
Riboswitches are the most common form of structured noncoding RNAs in bacteria [61][62][63][64][65]. The Rfam 12.0 [66] database contains sequence and structural information of more than 40 classes of riboswitches. We have downloaded all the covariance models of each riboswitch class from Rfam. Then, the cmsearch program of the Infernal 1.1 package [67] was used to identify the riboswitches from the metagenome-assembled genomic reads (MAGs). Furthermore, the detected riboswitches were again confirmed with the Riboswitch Scanner [68]. To obtain a local minimum energy structure, aligned consensus structure generated using the covariance model was passed to RNAfold from the Vienna RNA package 2.0 [69] as enforced structure constraints. This enables generating the potential riboswitch structure where the local minimum energy and covariance model are both considered [63,70]. FORNA from Vienna RNA package was used to visualize the riboswitch structure.

Metagenomic Single Nucleotide Variants (SNVs) Analysis
In order to calculate sympatric speciation in bacteria based on single nucleotide variant (SNV), first, human contaminations were removed by mapping trimmed reads to the human reference genome (hg38) using BWA tool [71]. Next, unmapped (non-human) reads were mapped to the CDS sequences of RefSoil+ database [2], which contains 888 bacteria and 34 archaea. Furthermore, mapping of RefSoil+ was subjected to the metaSNV pipeline to calculate pairwise distance between chalk and basalt samples based on population SNVs [34].

Metagenome Assembly
A total of~16 GB of raw reads from the shotgun metagenomic sequencing has been generated (Table S2), and after trimming the low-quality bases, about 15 Gb of clean reads were retained for each of the six samples from basalt and six samples from chalk soils. After assembly, an average of 850,000 contigs (>300 bp) was generated for each sample. Statistical information of assembled contigs (Table S3) shows that these assemblies were sufficient for the taxonomical and functional analysis of our study. We performed a Spearman correlation analysis to understand whether the differences between the two groups of samples (basalt and chalk) are more significant than the biological repeats (samples from the same group). We found that the differences between biological repeats are significantly smaller than between the samples from another group ( Figure S1), which signify the significant differences between the samples from basalt and chalk.

Bacterial Community Composition
Bacteria were the most dominant kingdom in both of the soils, followed by Archaea and Eukaryota. For the bacterial community, 58 phyla, 98 classes, 212 orders, 462 families, 1791 genera, and 6413 species were detected by metagenomic sequencing from 12 soil samples. We identified 32 phyla, 56 classes, 135 orders, 289 families, 981 genera, and 3103 species that were significantly different between basalt and chalk (Welch's t-test, p < 0.05). Among the bacterial phyla, Proteobacteria and Actinobacteria were found to be the most abundant phyla both in chalk soil as well as in basalt soil, accounting for over 90% of the total population of all the phyla, followed by Bacteroidetes, Acidobacteria, Verrucomicrobia, and Chordata ( Figure 2A). Bacterial composition was separated into two clusters in PCoA ( Figure 2B), and it is significantly higher in the more arid chalk soil ( Figure 2C). Actinobacteria was significantly higher in basalt than in chalk ( Figure 2D); in contrast, Proteobacteria, Firmicutes, Planctomycetes, and Bacteroidetes were higher in chalk than in basalt. High abundances of Proteobacteria and Actinobacteria indicate that these phyla played important roles in the soil bacterial communities providing basic functions related to the biogeochemical cycle. Moreover, we performed a t-test (paired) and found significant differences in a relative phylum abundance of Proteobacteria (p = 0.011), Actinobacteria (p = 4.6 × 10 −3 ), Planctomycetes (p = 3.16 × 10 −5 ), Chordata (p = 1.93 × 10 −4 ), and Cyanobacteria (p = 9.87 × 10 −4 ) (Figure 2A). Further genus-level analysis indicates that Streptomyces, Micromonospora, Bradyrhizobium, Kribbella, Actinoplanes, and Amycolatopsis were enriched in basalt, while in chalk, Sphingobium, Burkholderia, Xanthomonas, Mesorhizobium, and Rubrobacter were enriched ( Figure S2). Species with the most differences in abundance between basalt and chalk, revealed by linear discriminant analysis effect size (LEfSe) (Table S4)
The taxonomic assignments of the chalk and basalt soil samples were obtained by analyzing assembled contigs and the gene prediction and the classification of the potential functional genes via Clusters of Orthologous (COG) analysis. We summarized the results into four categories: information storage and processing (cluster I); cellular processes and signaling (cluster II); metabolism (cluster III); and poorly characterized function (cluster IV). Mostly, cluster III was dominant in both chalk and basalt soil samples which is related to the growth of microbial communities. Cluster II was the next most dominant in all the samples, followed by clusters IV and I ( Figure S4). Functional annotation at level 1 revealed that genes related to "carbohydrates", "co-factor, vitamins, prosthetic groups, pigments", "RNA metabolism", "Virulence, Disease and Defense", and "Metabolism of aromatic compounds" were highly dominant in both chalk and basalt microbial communities ( Figure S5). Functional annotation at level 2 demonstrated that genes related to "Diand oligosaccharides", "Biotin", and "Phage and prophages" are most enriched in basalt, while genes related to "Resistance to antibiotics and toxic compounds" and "transcription" are most enriched in basalt ( Figure S6). Therefore, the functional analysis demonstrated that the enriched processes are important for the functional adaptation of the microbes to survive in the stress associated with the specific soil environments. Biology 2022, 11, x FOR PEER REVIEW 8 of 17

Analysis of Structured Noncoding RNAs in Metagenome-Assembled Genomic Reads (MAGs) Sequences
Bacterial structured noncoding RNAs play vital roles in several biological processes such as gene regulation, signaling, RNA processing, and protein synthesis [77]. Among

Analysis of Structured Noncoding RNAs in Metagenome-Assembled Genomic Reads (MAGs) Sequences
Bacterial structured noncoding RNAs play vital roles in several biological processes such as gene regulation, signaling, RNA processing, and protein synthesis [77]. Among these, the most common groups of structured noncoding RNAs in bacteria are riboswitches [61]. Riboswitches are the cis-regulatory structural RNA sensors found in the 5' UTR of bacterial genes and control the expression of the associated genes/operons [78]. There are more than 40 different classes of riboswitch classes discovered so far, which are based on the type of ligand bind with the aptamer domain of specific riboswitches [61,78,79]. Understanding the abundance of specific riboswitch classes in bacteria can provide much deeper functional insights by identifying the metabolic pathway in which it participates and the conditions in which it is expressed. Therefore, we performed the riboswitch analysis to understand the abundance of riboswitch classes in the MAGs from basalt and chalk samples. We identified more riboswitches in the basalts than in chalk ( Figure 4A), indicating that the preference for RNA-based gene regulation could be important for their evolutionary adaptation in the soil of the basalt region. Riboswitches are highly evolutionary conserved at both their sequence and structural levels. Specific mutations in riboswitches could affect the riboswitch conformation by causing a global rearrangement [17,80,81]. In [17], a mutational study was performed at an "Evolution Canyon", and in [80], the rational design was exemplified by mutations predicted by energy minimization [69,81,82] and conformational switching was presented by using energy dot plots that can be analyzed in a variety of ways (e.g., as in [83]). Affecting riboswitches by mutations shows that sequence level variations of a particular riboswitch class that can significantly explain the differences between basalt and chalk bacterial groups could be useful for understanding the beneficial fitness effect in the specific group. For this purpose, we analyzed the TPP riboswitches, where six hits were found in basalt and four hits in chalk. We observed that TPP riboswitches detected in the MAGs from basalt have significantly higher GC content than MAGs in chalk (Avg_GC Basalt = 67.50, AVG_GC Chalk = 63.09, p-value = 0.0287) ( Figure 4B). This observation indicates that GC content variation could shape the regulatory divergence of bacteria to survive in the face of different ecological stresses associated with the chalk and basalt region at Evolution Plateau (EP). Although we found that riboswitches are more abundant in the basalt sample than in chalk, we have detected Fluoride riboswitch in the MAGs of the chalk sample, which is absent in the basalt sample. Figure 4C represents the fluoride riboswitch structure detected in the MAGs from the chalk sample. The fluoride riboswitch is important for the bacterial defense mechanism in counteracting against the high fluoride toxicity by regulating downstream genes that encode putative fluoride transporters, enzymes that are known to be inhibited by fluoride [84]. Therefore, the presence of fluoride riboswitch in the bacterial MAGs of the chalk sample indicates that chalk soil could contain higher fluoride concentration, and some bacterial species use fluoride riboswitches to regulate the expression of proteins that alleviate the deleterious effects of fluoride. In conclusion, the diversity within the riboswitch-based gene regulation in between the bacteria of chalk and basalt samples indicates that structured noncoding RNAs are the potential driver for generating bacterial adaptation in new environments, which could play a significant role in bacterial evolution, including the sympatric speciation. Biology 2022, 11, x FOR PEER REVIEW 10 of 17

Genetic Divergence between the Two Microsites
We have observed a significant difference in functional enrichment and noncoding structured RNAs between the bacteria from chalk and basalt samples. We hypothesized that the bacterial community evolved in two soil types to adapt to local ecological stress, where some bacterial species might be involved in SS. Sympatric speciation could explain the origination of new species from the ancestral species, while both are found in different ecological abutting contrasts. Although, in most cases, the complete genome information could not be mapped from metagenomic data, we can map and compare the large segment of the genome sequence of particular species to check if they are significantly different between two samples, which could help to infer the potential sympatric speciation in bacteria. To look at the potential sympatric speciation between the chalk and basalt soil samples, we performed the mapping to the reference database of soil bacteria and archaea containing the CDS sequences of 888 bacteria and 34 archaea. Interestingly, we found that most of the sympatric species were also found to be highly abundant at the genus level. Species-level pairwise distance based on single nucleotide variants (SNVs) between the chalk and basalt populations was calculated using the metaSNV pipeline [34], and it showed a significant difference in the same species between the two soil types ( Figure S7). In PCoA, based on pairwise distance, we have identified more than 20 species showing clear-cut separate clusters ( Figure S7). Furthermore, these species also have differences in SNP composition between chalk and basalt ( Figures 5 and S8-S25), PCA (Figures 5A and S8A-S25A), phylogenetic tree ( Figures 5B and S8B-S25B), population STRUCTURE analysis ( Figure 5C), and genetic networking ( Figure 5D) showed separate genetic clusters of basalt and chalk, suggesting sympatric divergence. The genetic distance between the chalk and basalt soil populations of Streptomyces lividans measured by FST is 0.058, suggesting differentiation with gene flow. The nucleotide diversity (π) for basalt and chalk is 3.6 × 10 −3 and 2.8 × 10 −3 , respectively. FST and nucleotide diversity (π) of other species were also calculated for these species based on SNPs (Table S5). However, this analysis failed to classify if they are new strains or new species, as retrieving complete genome sequence information of all these species was not possible from the metagenomic data. However, this finding indicates that several bacterial species change their genomic sequences to evolve and adapt to a new environment that could drive the generation of new species.

Genetic Divergence between the Two Microsites
We have observed a significant difference in functional enrichment and noncoding structured RNAs between the bacteria from chalk and basalt samples. We hypothesized that the bacterial community evolved in two soil types to adapt to local ecological stress, where some bacterial species might be involved in SS. Sympatric speciation could explain the origination of new species from the ancestral species, while both are found in different ecological abutting contrasts. Although, in most cases, the complete genome information could not be mapped from metagenomic data, we can map and compare the large segment of the genome sequence of particular species to check if they are significantly different between two samples, which could help to infer the potential sympatric speciation in bacteria. To look at the potential sympatric speciation between the chalk and basalt soil samples, we performed the mapping to the reference database of soil bacteria and archaea containing the CDS sequences of 888 bacteria and 34 archaea. Interestingly, we found that most of the sympatric species were also found to be highly abundant at the genus level. Species-level pairwise distance based on single nucleotide variants (SNVs) between the chalk and basalt populations was calculated using the metaSNV pipeline [34], and it showed a significant difference in the same species between the two soil types ( Figure S7). In PCoA, based on pairwise distance, we have identified more than 20 species showing clear-cut separate clusters ( Figure S7). Furthermore, these species also have differences in SNP composition between chalk and basalt ( Figures 5 and S8-S25), PCA ( Figures 5A and S8A-S25A), phylogenetic tree ( Figures 5B and S8B-S25B), population STRUCTURE analysis ( Figure 5C), and genetic networking ( Figure 5D) showed separate genetic clusters of basalt and chalk, suggesting sympatric divergence. The genetic distance between the chalk and basalt soil populations of Streptomyces lividans measured by F ST is 0.058, suggesting differentiation with gene flow. The nucleotide diversity (π) for basalt and chalk is 3.6 × 10 −3 and 2.8 × 10 −3 , respectively. F ST and nucleotide diversity (π) of other species were also calculated for these species based on SNPs (Table S5). However, this analysis failed to classify if they are new strains or new species, as retrieving complete genome sequence information of all these species was not possible from the metagenomic data. However, this finding indicates that several bacterial species change their genomic se-quences to evolve and adapt to a new environment that could drive the generation of new species. Further experimental studies on sequencing each of the genomes of these bacteria and their phylogenomic analysis could identify many new bacterial species, which might be generated from the SS. Further experimental studies on sequencing each of the genomes of these bacteria and their phylogenomic analysis could identify many new bacterial species, which might be generated from the SS.

Discussion
SS has been controversial since Darwin suggested it (Darwin 1859). The key problem is whether they have allopatric history, and rare cases of SS admitted were from islands or small places [85]. In the current study, the SS arena, dubbed EP, was formed by a volcanic eruption around one Mya in chalk regions, which is like a basaltic island floating on a chalk ocean (Figure 1). The isolated basaltic island precludes the allopatric possibility, especially in the abutting regions, which is like the case of palm from abutting contrasting soils [86,87]. Our previous studies identified numerous SS events in EC and EP. Several SS across life, from bacteria to mammals, have been identified at EC, including the soil bacterium Bacillus simplex [32], Hordeum spontaneum [37], wild emmer wheat [88], Triticum dicoccoides [89], crucifer Ricotia Lunaria [90], Oryzaphilus surinamensis [91], etc. Similarly, several SS cases were identified from the EP [24,25,39,40], which was suggested to be an SS hotspot [92].
The central questions of bacterial ecology and evolution require a method to consistently demarcate, from the vast and diverse set of bacterial cells within a natural community, the groups playing ecologically distinct roles (ecotypes). Because of a lack of theorybased guidelines, current methods in bacterial systematics fail to divide the bacterial domain of life into meaningful units of ecology and evolution [26,93]. We introduce a sequence-based approach ("ecotype simulation") to model the evolutionary dynamics of bacterial populations and to identify ecotypes within a natural community, focusing here

Discussion
SS has been controversial since Darwin suggested it (Darwin 1859). The key problem is whether they have allopatric history, and rare cases of SS admitted were from islands or small places [85]. In the current study, the SS arena, dubbed EP, was formed by a volcanic eruption around one Mya in chalk regions, which is like a basaltic island floating on a chalk ocean (Figure 1). The isolated basaltic island precludes the allopatric possibility, especially in the abutting regions, which is like the case of palm from abutting contrasting soils [86,87]. Our previous studies identified numerous SS events in EC and EP. Several SS across life, from bacteria to mammals, have been identified at EC, including the soil bacterium Bacillus simplex [32], Hordeum spontaneum [37], wild emmer wheat [88], Triticum dicoccoides [89], crucifer Ricotia Lunaria [90], Oryzaphilus surinamensis [91], etc. Similarly, several SS cases were identified from the EP [24,25,39,40], which was suggested to be an SS hotspot [92].
The central questions of bacterial ecology and evolution require a method to consistently demarcate, from the vast and diverse set of bacterial cells within a natural community, the groups playing ecologically distinct roles (ecotypes). Because of a lack of theory-based guidelines, current methods in bacterial systematics fail to divide the bacterial domain of life into meaningful units of ecology and evolution [26,93]. We introduce a sequence-based approach ("ecotype simulation") to model the evolutionary dynamics of bacterial populations and to identify ecotypes within a natural community, focusing here on two Bacillus clades surveyed from the EC in Israel. This approach has identified multiple ecotypes within traditional species, with each predicted to be an ecologically distinct lineage; many such ecotypes were confirmed to be ecologically distinct, with specialization to different canyon slopes with different solar exposures.
In this study, high-throughput shotgun metagenomics sequencing and advanced bioinformatics algorithms enabled us to investigate microbial abundance and diversity under two sharply contrasting abutting divergent soil types. For example, microbial abundance at the phylum level shows that Actinobacteria was significantly highly abundant in basalt soil, while Bacteroidetes, Gemmatimonadetes, Firmicutes, and Verrucomicrobia were significantly highly abundant in chalk soil (Figure 2). The Actinobacteria are significantly higher in basalt soil, and Proteobacteria are significantly higher in chalk soil ( Figure 2D), which is similar to a previous study [94] and was explained by the lower water availability in chalk than in basalt [94]. The basalt soil in Israel is typically slightly acidic [36]. Microbial abundance at the genus level shows that Streptomyces, Micromonospora, Kribbella, Microbacterium, and Frankia were significantly abundant in basalt soil samples. Firmicutes are higher in the dry chalk ( Figure 2D), which is probably because the mild water stress could increase the relative abundance [95], and lower abundance members of the Firmicutes could facilitate bioremediation of acid basalt soil [96], where Gemmatimonas were significantly highly abundant in chalk soil samples.
Furthermore, we observed significant differences in functional enrichment between the bacteria from chalk and basalt soil. We found that most of the COG annotations were involved in microbial metabolism ( Figure S4), and the relative abundance of cluster III (metabolism) was the highest among the four clusters. These functional differences help the bacterial community in specific soil adapt to the local ecology. Next, to check if there were any differences in non-coding RNA-based gene regulations, we performed the riboswitch analysis in the bacteria from two different soil types. Interestingly, we observed the differences in preferences of different metabolite sensing riboswitch classes. Furthermore, we detected differences in the GC content of highly conserved TPP riboswitch sequences between the bacteria from chalk and basalt soils, which indicates that these differences were generated due to the adaptive mutations that could have a significant role in the bacterial response to specific ecologies. All these findings decipher that the differential patterns of non-coding structural RNAs could alter the regulatory patterns of the bacteria, which could drive the soil bacterial evolution in microclimatic ecological differences.
Furthermore, we have also identified several sympatric species between the chalk and basalt soil populations, and many of them were also found to be highly abundant at the genus level in both populations. Overall, this study opens a new paradigm of soil bacterial evolution and sympatric speciation. Furthermore, our finding suggests that mutations in the functional genes and in the structured non-coding RNA are continuous processes that could help the soil bacterial community evolve continuously and adapt to ecological changes. The divergence of each species between the chalk and basalt soils showed clearly separate clusters, which were caused by the contrasting ecological stresses, either water availability or pH values, or both. Although there is no physical barrier between the two soils, the strong selection would overrule the gene flow between them, and the cumulated mutations facilitated speciation between them, just as in other species from the same speciation arena [24,25], which were proved to be SS. Since Evolution Canyons (ECs) and Evolution Plateaus (Eps) are numerous on our planet, SS appears to be a common model of the origin of species. Clearly, microsites divergent ecologically, geologically, edaphically, climatically, abiotically, and biotically are numerous across the planet; hence, SS, first hypothesized by Darwin, is proved to be a common model of speciation, not only in Israel but globally.

Conclusions
Soil bacteria tend to evolve rapidly in response to new environmental stress. However, the evolution of soil bacteria under microclimatic ecological differences is largely unexplored. In this study, we performed a comparative metagenomic analysis of two sharply divergent soil types and investigated the evolution of soil bacteria. Our findings elucidate the significant divergence of bacterial taxonomic compositions between two soil types. Furthermore, we found that the bacterial community from two soil types significantly differ in functional enrichment and preferences in RNA-based gene regulations. Hence, these findings suggested that the bacterial community diverged and functionally evolved to adapt to local ecological stress. Next, we detected several soil bacterial species have significant genetic divergence between the two soil types, which could support their evolution towards sympatric speciation. Thus, our study provided detailed insights into the soil bacterial evolution and incipient sympatric speciation under microclimatic ecological differences and encouraged further research in this area to uncover the evolutionary patterns of soil bacteria in the face of new ecological stresses.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/biology11081110/s1, Figure S1: Spearman correlations between all the samples from basalt and chalk; Figure S2: Level of significance for relative abundances of all Genera in soil samples; Figure S3: Level of significance for relative abundances of all CaZy in soil samples; Figure S4: Functional categories for COGs; Figure Table S1: List of articles on sympatric species; Table S2: Statistics of raw sequenced reads; Table S3: Statistics of assembled contigs; Table S4: Species identified with LEfSe that differed significantly in basalt and chalk, LDA SCORE (log10) > 2; Table S5: Sympatric divergence of the single bacteria of most 19 abundant bacteria, F ST , and nucleotide diversity (π) were calculated.