Metagenomics Uncovers a Core SAR11 Population in Brackish Surface Waters of the Baltic Sea

: The Baltic Sea represents one of the largest brackish ecosystems where various environmental factors control dynamic seasonal shifts in the structure, diversity, and function of the planktonic microbial communities. In this study, despite seasonal ﬂuctuations, several bacterial populations ( < 2% of the total OTUs) that are highly dominant (25% of relative abundance) and highly frequently occurring ( > 85% of occurrence) over four seasons were identiﬁed. Mathematical models using occurrence frequency and relative abundance data were able to describe community assembly persisting over time. Further, this work uncovered one of the core bacterial populations phylogenetically a ﬃ liated to SAR11 subclade IIIa. The analysis of the hypervariable region of 16S rRNA gene and single copy housekeeping genes recovered from metagenomic datasets suggested that the population was unexpectedly evolutionarily closely related to those inhabiting a mesosaline lacustrine ecosystem rather than other marine / coastal members. Our metagenomic results further revealed that the newly-identiﬁed population was the major driver facilitating the seasonal shifts in the overall community structure over the brackish waters of the Baltic Sea. The core community uncovered in this study supports the presence of a brackish water microbiome distinguishable from other marine and freshwater counterparts and will be a useful sentinel for monitoring local / global environmental changes posed on brackish surface waters.


Introduction
The Baltic Sea represents a semi-enclosed aquatic environment that receives high freshwater load but has limited connection to the ocean. The Baltic Sea thus displays a horizontal salinity gradient over geographical distance, while the major part is largely characterized by brackish conditions. Studies have documented the abundance and activity of microbial taxa associated with various environmental factors, advancing the understanding of community composition and diversity upon various stressors [1][2][3][4][5][6][7]. Microorganisms tend to grow faster with a smaller genome size relative to macroorganisms, facilitating rapid shifts in community structure and diversity upon anthropogenic perturbations (e.g., eutrophication, pollution, and global warming) as well as natural environmental gradients. With the changing environmental conditions under global warming scenarios, aquatic microbial communities may exhibit dynamic shifts in microbial composition and diversity, which ultimately impacts the large scale alterations of carbon and nutrient cycles [8]. Examining microbial community composition, diversity, and assembly in this environment has important implications for not only microbial ecology but also diagnosis/monitoring of the type and degree of anthropogenic pressures being posed on the marine ecosystems.
Transitions in microbial composition along the salinity gradient are well described in the Baltic Sea and other coastal/marine environments. In contrast to the lateral salinity gradient, salinity remains quite stable in a given area of the Baltic Sea over time. Studies have reported temporal variation of community structure and diversity from short-term (hours to days) to long-term (months to years) time scales. Among others, temperature, seasonal stratification/mixing, and phosphorous concentration are found to drive the strong seasonal community dynamics. Turnover in winter enriches members of Epsilonproteobacteria and Archaea to the surface waters from the deep habitat [1]. Actinobacteria, Flavobacteria, and Planctomycetes populations are highly enriched in autumn [1]. Cyanobacterial blooms in summer are highly correlated with the level of phosphorus [2]. These works documented how deterministic environmental factors affect community structure/dynamics and specific taxa, particularly in response to distinct environmental factors.
Microbial community composition, structure, and diversity greatly vary in a range of spatial and temporal scales, where it is hypothesized that there are major drivers that largely contribute to the shifts in community structure and function. Among others, a core community consists of organisms (and/or genes) that are ubiquitous or frequently occurring in a given habitat with variable environmental conditions. Recent studies have shown core members across many different ecosystems such as gut microbiomes, aquatic environments, and biological wastewater treatment plants where the core community plays critical roles in ecosystem functions [9]. For example, among thousands of microbial operational taxonomic units (OTUs) revealed through deep sequencing studies, the SAR11 clade is found to be frequently occurring and highly abundant in planktonic microbial communities in marine environments including the Baltic Sea. The genome streamlining and cellular phenotypes conferring fitness in oligotrophic environments help SAR11 be particularly competitive in the ocean [10]. The organisms sustain the ecosystem functions and services in not only the local habitats (e.g., Baltic Sea) but also global scales through carrying out essential roles in biogeochemical cycles of carbon, nitrogen, and sulfur [10]. Earlier studies report the ubiquity and numerical dominance of SAR11 bacteria in the Baltic Sea surface waters, implying their core membership [5,6]. However, these studies determined the abundance and occurrence of SAR11 bacteria at various taxonomic levels (e.g., from the entire SAR11 to the sub-lineage level). Since SAR11 bacteria (Pelagibacterales) are an Alphaproteobacteria order, consisting of members with diverse phylogenetic and functional diversity, it remains to be clearly elucidated at what taxonomic level the SAR11 bacteria hold the core membership in the Baltic Sea surface waters. Further, the previous studies documented the entire or sub-lineages of SAR11 as being frequently occurring in various ranges of habitats/periods. Since the Baltic Sea surface waters exhibit a variety of environmental conditions over the range of different spatiotemporal scales, where (e.g., a relatively narrow range of habitats/periods vs. all geographical/temporal sites) the core membership of the SAR11 bacteria is valid remains to be further described.
In the present study, we performed a comprehensive survey of exploring core members of planktonic microbial communities in the Baltic Sea surface waters using metagenomic datasets obtained a wide range of spatial (32 locations with a greatly varied salinity from freshwater to eusaline) and temporal (over four seasons in one location) scales. Our metagenomic approach using various target sequences (e.g., hypervariable 16S rRNA gene regions and single copy marker genes) could determine the core community at different taxonomic levels (from order down to sub-species population) and specify the range of spatial atlas and temporal scales where the core membership could be valid. Our findings not only provide quantitative insights into planktonic members making up the core community but also help researchers focus on the key drivers, other than numerous transient organisms (satellites), shaping the community structure and function, which will be a useful basis for manipulating ecosystem services and monitoring anthropogenic environmental changes.

Retrieving and Analyzing 16S rRNA Gene Sequences from Metagenomics Datasets
A total of 42 metagenomic datasets (Table S1 and Figure S1) that were obtained from Baltic Sea surface waters in the previous studies [1,4,5] were collected and analyzed in this study. Raw metagenomic reads were trimmed with a Q = 20 Phred quality score cutoff using SolexaQA2 [11]. To identify 16S rRNA gene-encoding sequences, metagenomic reads were searched using BLASTn with a cutoff of >70% nucleotide identity and ≥95 match length as described in [12], against the 16S rRNA gene database (hypervariable V6 region) [13]. Only the V6-encoding region of a read was considered, and the other regions of the read were trimmed for further analysis. The V6-encoding reads were analyzed using the MOTHUR pipeline [14], as described previously [15,16]. In brief, the read data were pre-processed using the following parameters: maxambig = 0, maximum length of homopolymer = 8, and other parameters at default settings. The pre-processed sequences were chimera-checked and classified using the commands chimera.vsearch and classify.seqs. The chimeric sequences and those assigned to chloroplast, mitochondria, unknown, archaea, and eukaryote were removed, after which 119-898 V6-encoding sequences were eventually obtained across all metagenomic datasets used in this study. A total of 100 sequences per dataset was randomly drawn for normalization across datasets. The sequences were clustered into representative OTUs based on a 97% nucleotide identity cutoff, using UCLUST (v9.2) [17]. The relative abundance of an OTU was estimated based on the number of sequences clustered into an OTU among the 100 sequences per dataset. Statistical testing for differential community characteristics (e.g., community composition) was conducted using the Mann-Whitney U test.

Modeling the Relationship between Occurrence Frequency and Relative Abundance
Five models (Nachman, Hanski-Gyllenberg, Power, Poisson, and Negative Binomial) were employed to describe the quantitative relationship between the observed occurrence frequency and relative abundance of OTUs across 31 samples taken in one location (LMO) over the four seasons. The model parameter and the sum of absolute differences for the goodness of fit were estimated as described previously [18,19].

The Baltic Sea Planktonic Ecosystem Contains the Core Microbial Community
A total of 31 metagenomic datasets originating from the same location (LMO) 3 in the central area of the Baltic Sea over the four seasons ( Figure S1) were first analyzed. All 31 samples displayed the oligotrophic and mesosaline condition ( Table 1). The observed occurrence frequency and relative abundance of OTUs across the 31 datasets were fitted to the five predictive models (Table S2). While the occurrence frequency-abundance data fit relatively well to the Nachman, Hanski-Gyllenberg, and Power models ( Figure 1A), the Nachman model was found to best describe (based on sub of absolute difference) the community assembly of the planktonic bacterial communities taken over seasons based on the lower sum of absolute difference (i.e., higher goodness of fit) (Table 1). Particularly, this analysis identified several OTUs (with red color) that were highly frequently occurring and abundant and many others (with blue color) with low occurrence frequency and low abundance. Sum of absolute difference and the coefficient of determination (R 2 ) were determined for estimating goodness of fit, as suggested previously [18,19]. µ and P represent the relative abundance and occurrence frequency, respectively. α, β, and k denote model fitting parameters.
The occurrence frequency-abundance relationship model is often used to describe the spatial distribution patterns rather than the temporal ones [25]. The core-satellite hypothesis describes the spatial distribution pattern of organisms with evolutionary processes including colonization, extinction, and immigration [26]. A positive correlation between the two variables has been generally observed in macroecology (e.g., various taxa such as insects, birds, and plants) [18]. Core members are shaped by interactions between taxa and niche specialization, whereas satellite ones result from random dispersal. The present and other recent studies [27] expanded the predictive framework to quantitatively describe the bacterial community assembly over time. Figure 1B shows the number of OTUs in relation to the occurrence frequency. Among a total of 403 OTUs, the majority (345) of OTUs were associated with low occurrence frequency (<20%). The number rapidly decreased with the increase of occurrence frequency: 35 (20-40% occurrence frequency), 14 (40-60%), 4 (60-80%), and 5 (80-100%). Despite a slight increase detected at the 80-100% occurrence frequency compared to that at 60-80%, the number of OTUs did not follow a bimodal Water 2020, 12, 501 5 of 11 distribution pattern (p > 0.05 by Mitchell-Olds and Shaw test). The OTUs were clustered using the occurrence frequency criteria used in previous studies [28]: core (≥80% occurrence frequency), intermittent (20-80%), and transient (≤20%) OTUs. Figure 1C displays the number and relative abundance of OTUs associated with the three defined OTU groups. Of the OTUs, 85% were affiliated to the transient group that occupied 31% of the communities over seasons. Of note was that the core group consisted of 1.3% of the total OTUs but accounted for a significant fraction (25%) of the communities in terms of average relative abundance.

Model Equation Parameter
Maximum Likelihood Sum of absolute difference and the coefficient of determination (R 2 ) were determined for estimating goodness of fit, as suggested previously [18,19]. and represent the relative abundance and occurrence frequency, respectively. α, β, and k denote model fitting parameters.  A number of bacterial planktons occur intermittently/transiently, associated with seasonal selection as previously described [3]. A previous study documented a contrasting seasonal pattern in microbial communities at the same site: predominant members (e.g., Bacteroidetes) feeding on complex carbohydrates in spring-early summer and specific lineages of Actinobacteria capable of utilizing Cyanobacteria-derived metabolites in autumn [3]. Despite the seasonal selection (e.g., fluctuating temperature) shaping the community assembly as previously reported, our modeling approach revealed the presence of a few core bacterial populations persisting on surface waters over seasons.

Sum of Absolute Difference
Some caveats can exist in identifying the core members using sequence datasets, which include but not limited to, biases associated with the PCR steps and choices of primers, databases, and target sequence regions [9]. The metagenomic datasets used in this study are without the PCR amplification of a target single gene (e.g., 16S rRNA gene), bypassing the biases related to the PCR steps and primers. A full-length 16S rRNA gene includes at least nine hypervariable sequence regions (V1 to V9), which have been popularly used for microbial community composition profiling. Although the V6 region-based results (as used in this study) is found to be comparable to those using full-length sequences [29], we also compared the V6 region-based results with those using another hypervariable region (V9), showing that both results were highly consistent with each other (data not shown). A frequently used approach in defining the core community utilizes the relative abundance or the presence/absence of microbial taxa, although other approaches benefitting from the information about phylogeny and species-species interaction are also described in [9]. Sequencing depth is an important factor affecting the identification of core bacteria. The sequencing depth directly influences the presence/absence of a taxon, since the current sequencing depth used for surveying many microbial communities can highly likely miss the rare components. Accordingly, many studies establish a cutoff (e.g., 50-90%) for the relative abundance and occurrence frequency (e.g., 80%) to circumvent the issue associated with the limited sequencing depth, as in the present study. The use of the larger sequencing depth would increase the number of core members, as described previously [9,30]. Accordingly, although future studies with increasing sequencing depth (e.g., with more than thousands of OTU sequences) may capture some more core members than those in this study, the core membership of the OTUs revealed in this study will still be valid, regardless of the relatively smaller dataset size.

The Core SAR11 Population is Distinct from Other Marine Counterparts
The presence of the frequently occurring and abundant OTUs over the four seasons led us to further determine the taxonomic identity. Five OTUs (OTU_01 through OTU_05) with 4-5% of average relative abundance and 80-87% of occurrence frequency was taxonomically analyzed (Table S2). OTU_01, OTU_02, and OTU_04 were related to Pelagibacteraceae, OTU_03 with Halomonadaceae, and OTU_05 with Flavobacteriaceae. The family-level taxonomic affiliation using the MOTHUR package was consistent with the sequence homology search results using BLASTn against the 16S ribosomal RNA sequences database (NCBI). The most abundant bacterial population (OTU_01) showed > 99% nucleotide sequence identity to members of the SAR11 subclade IIIa.
The intra-population structure of the Pelagibacterales was examined further. All metagenomic reads originating from one location (LMO) were mapped using >70% nucleotide identity on the nine single copy gene markers retrieved from the representative SAR11 bacteria. Figure 2 shows the marker gene coverage across the 31 datasets. The coverage of a representative subclade IIIa (QL1) was the highest (18.6 ± 13.2×), followed by subclade Ia populations: HTCC1062 (3.4 ± 2.2×), HTCC7211 (0.9 ± 0.5×), HTCC1016 (0.9 ± 0.6×), HIMB5 (0.7 ± 0.3×), and HTCC1002 (0.2 ± 0.2×). Other SAR11 populations showed less than 0.5× on average. Figure 3 illustrates the relative fraction of the read nucleotide identity (i.e., those mapped on the nine single-copy marker genes) against subclade IIIa (QL1, IMCC9063, and HIMB114). The majority of the reads mapped on HIMB114 and IMCC9063 showed 87-92% nucleotide identity. In contrast to a moderate level of genomic relatedness, the majority of the metagenomic read identity to QL1 was 95-100% (97% on average). It thus appeared that the highly frequently occurring (87% of occurrence frequency) and most abundant (5.3% of average relative abundance) populations in the Baltic Sea surface waters (LMO) were closely genomically related to QL1-like populations.
Water 2020, 12, x FOR PEER REVIEW 6 of 11 The presence of the frequently occurring and abundant OTUs over the four seasons led us to further determine the taxonomic identity. Five OTUs (OTU_01 through OTU_05) with 4-5% of average relative abundance and 80-87% of occurrence frequency was taxonomically analyzed (Table  S2). OTU_01, OTU_02, and OTU_04 were related to Pelagibacteraceae, OTU_03 with Halomonadaceae, and OTU_05 with Flavobacteriaceae. The family-level taxonomic affiliation using the MOTHUR package was consistent with the sequence homology search results using BLASTn against the 16S ribosomal RNA sequences database (NCBI). The most abundant bacterial population (OTU_01) showed > 99% nucleotide sequence identity to members of the SAR11 subclade IIIa.
The intra-population structure of the Pelagibacterales was examined further. All metagenomic reads originating from one location (LMO) were mapped using >70% nucleotide identity on the nine single copy gene markers retrieved from the representative SAR11 bacteria. Figure 2 shows the marker gene coverage across the 31 datasets. The coverage of a representative subclade IIIa (QL1) was the highest (18.6 ± 13.2×), followed by subclade Ia populations: HTCC1062 (3.4 ± 2.2×), HTCC7211 (0.9 ± 0.5×), HTCC1016 (0.9 ± 0.6×), HIMB5 (0.7 ± 0.3×), and HTCC1002 (0.2 ± 0.2×). Other SAR11 populations showed less than 0.5× on average. Figure 3 illustrates the relative fraction of the read nucleotide identity (i.e., those mapped on the nine single-copy marker genes) against subclade IIIa (QL1, IMCC9063, and HIMB114). The majority of the reads mapped on HIMB114 and IMCC9063 showed 87-92% nucleotide identity. In contrast to a moderate level of genomic relatedness, the majority of the metagenomic read identity to QL1 was 95-100% (97% on average). It thus appeared that the highly frequently occurring (87% of occurrence frequency) and most abundant (5.3% of average relative abundance) populations in the Baltic Sea surface waters (LMO) were closely genomically related to QL1-like populations.  SAR11 bacteria include four major ecotypes (subclade I, II, IIIa, and IIIb). Among subclade III, IIIb (freshwater LD12) dominates the planktonic microbial communities in some inland waters [31] whereas subclade IIIa (IMCC9063 and HIMB114) were thought to inhabit coastal/oceanic surface waters. Recent works have shown the assembly of metagenomic sequences into genomes, which helps improve the current understanding of the ecology of microorganisms that are hardly cultivable in laboratory settings such as SAR11 bacteria. Our previous work constructed a metagenome-assembled genome (MAG) of a novel SAR11 population, QL1, dominating a mesohaline lacustrine ecosystem on the Tibetan plateau [23,32]. Both 16S rRNA genes and representative single copy genes demonstrated the close genomic relatedness of the core population occurring in the site (LMO) to QL1.  To attempt the recovery of the metagenome-assemble genomes (QL1-like populations), we performed metagenome assembly with several metagenomic datasets that showed high genome coverage of QL1, using IDBA-UD [33] with a range of k-mer from 20 to 100 with a step increase of 20. Genome bins were then obtained based on the metagenome sequence read coverage, occurrence of unique marker genes, and tetranucleotide frequency, using MaxBin as described previously [34]. In contrast to the successful recovery of draft genomes using a similar approach in previous studies [23,32] the genome bins obtained using the metagenomic datasets used in this study showed low genome completeness, unfortunately. Assembling metagenomic data into high-quality genomes can be a complex task, which can be attributable to the inherent biological complexity (e.g., intra-species level genomic diversity and mobile genetic elements) as well as the highly demanding requirement of computer memory [33]. The analysis using housekeeping genes (e.g., more than five) that encode essential cellular functions has become a commonly used tool for determining the phylogenetic and taxonomic relationships among bacteria [35]. The analysis using several housekeeping genes could define clear species boundaries, which has a much higher discrimination power than those using small subunit ribonucleic acid (SSU rRNA) genes (16S rRNA gene) [36]. Further, the results of the analysis was found to be potentially comparable to the whole genome sequencing data [37] and highly correlated to many DNA-DNA hybridization results when assessing the degree of genetic relatedness and gene content conservation at down to intraspecies levels [38]. Hence, although the draft genome recovery of the core population was not successful, both 16S rRNA gene and nine housekeeping genes consistently supported the taxonomic/genomic relatedness of the core population in the Baltic Sea surface waters to the QL1 population, distinguishable from other coastal/marine SAR11 representatives.

Spatiotemporal Distribution of the QL1-Like Population in the Baltic Sea
The time-series distribution pattern of the QL1-like population in one same place (LMO) was characterized first. Figure 4 shows the marker gene coverage of the representative SAR11 bacteria over seasons. The temperature in winter and spring (3-7 • C) clearly differed from summer (7-19 • C) (Table S1) in the LMO. Figure S2 shows the seasonal variation of the total SAR11, subclade IIIa, and QL1, respectively, between winter-spring and summer. The total SAR11, subclade IIIa, and QL1 populations were significantly (p < 0.05 by Mann-Whitney U test) enriched in summer. No other SAR11 subclades/species were differentially abundant between the two seasons, suggesting that the differential abundance of the total SAR11 bacteria was attributable to that of subclade IIIa, particularly, QL1. Ordinary least squares regression (OSLR) analysis also suggested that QL1 coverage was positively correlated with the increase of temperature (Pearson correlation = 0.56 with p < 0.05) ( Figure 4A). We next carried out marker gene survey spanning over a wide spatial scale, using other 11 metagenomic datasets (GS659-GS694 and S80704) across 11 sites [3,4], in addition to the 31 datasets (LMO). The sample sites exhibited a salinity gradient range from freshwater to eusaline (up to 32 PSU) (Table S1 and Figure S1). Figure 4B shows the relative abundance of representative SAR11 bacteria along the salinity gradient. Subclade IIIb, IIIa, and Ia showed higher gene coverage (0.9, 2.8, and 2.1×) in the metagenomic datasets originating from freshwater, oligosaline-mesosaline, and polysaline-eusaline water samples, respectively. Herlemann [5] reported the three ecotypes of SAR11 bacteria dominating the freshwater, brackish, and marine environments in the Baltic Sea, using 454 pyrosequencing analysis of partial 16S rRNA gene sequences. Herlemann [5] later could narrow the taxonomic identity of the brackish SAR11 ecotype down to the SAR11 subclade IIIa using catalyzed reporter deposition fluorescence in situ hybridization (CARD-FISH). Dupont [4] also reported the dominance of SAR11 subclade IIIa in the brackish region using metagenomic datasets. The previous metagenomic study reported that the dominant IIIa population shares moderate-level genomic relatedness to a SAR11 IIIa representative (IMCC9063) with 70-80% nucleotide identity [4][5][6]. Since the genomic relatedness was far below 95% of average nucleotide identity often used for the current bacterial species demarcation [38], the metagenomic finding implied the presence of a novel SAR11 species in the oligosaline-mesosaline waters, closely related to Subclade IIIa, despite the unclear taxonomic identity of the population at the moment. The present work clearly showed high genomic relatedness (>95%) between the core SAR11 population and QL1 (Figure 3), revealing that the subclade IIIa population with seasonal fluctuations that dominate over the wide brackish surface areas of the Baltic Sea was a QL1-like population. A complete genome sequence of SAR11 reported recently [39] was not included in this analysis. Nevertheless, our conclusion on the dominance of the subclade IIIa population on the Baltic Sea surface waters would not be affected, since the organism that was whole-genome sequenced recently is clearly a member of subclade IIIb [39].
16S rRNA gene-based surveys reported that the QL1-like population is globally found in various marine and lacustrine environments [31]. In particular, a recent study revealed the dominance of the QL1-like population (>95% genomic relatedness) in the Caspian Sea [40], in addition to other brackish waters globally distributed (e.g., the Baltic Sea, Chesapeake Bay, Delaware Bay, Lake Nam Co, and Lake Qinghai) [23,31,41,42]. Metagenomics uncovered distinct features in phylogenetic diversity and gene content between freshwater and oceanic communities [43]. Taking the finding one step further, the present and other studies [3,5] collectively support the hypothesis that brackish waters exert We next carried out marker gene survey spanning over a wide spatial scale, using other 11 metagenomic datasets (GS659-GS694 and S80704) across 11 sites [3,4], in addition to the 31 datasets (LMO). The sample sites exhibited a salinity gradient range from freshwater to eusaline (up to 32 PSU) (Table S1 and Figure S1). Figure 4B shows the relative abundance of representative SAR11 bacteria along the salinity gradient. Subclade IIIb, IIIa, and Ia showed higher gene coverage (0.9, 2.8, and 2.1×) in the metagenomic datasets originating from freshwater, oligosaline-mesosaline, and polysaline-eusaline water samples, respectively. Herlemann [5] reported the three ecotypes of SAR11 bacteria dominating the freshwater, brackish, and marine environments in the Baltic Sea, using 454 pyrosequencing analysis of partial 16S rRNA gene sequences. Herlemann [5] later could narrow the taxonomic identity of the brackish SAR11 ecotype down to the SAR11 subclade IIIa using catalyzed reporter deposition fluorescence in situ hybridization (CARD-FISH). Dupont [4] also reported the dominance of SAR11 subclade IIIa in the brackish region using metagenomic datasets. The previous metagenomic study reported that the dominant IIIa population shares moderate-level genomic relatedness to a SAR11 IIIa representative (IMCC9063) with 70-80% nucleotide identity [4][5][6]. Since the genomic relatedness was far below 95% of average nucleotide identity often used for the current bacterial species demarcation [38], the metagenomic finding implied the presence of a novel SAR11 species in the oligosaline-mesosaline waters, closely related to Subclade IIIa, despite the unclear taxonomic identity of the population at the moment. The present work clearly showed high genomic relatedness (>95%) between the core SAR11 population and QL1 (Figure 3), revealing that the subclade IIIa population with seasonal fluctuations that dominate over the wide brackish surface areas of the Baltic Sea was a QL1-like population. A complete genome sequence of SAR11 reported recently [39] was not included in this analysis. Nevertheless, our conclusion on the dominance of the subclade IIIa population on the Baltic Sea surface waters would not be affected, since the organism that was whole-genome sequenced recently is clearly a member of subclade IIIb [39].
16S rRNA gene-based surveys reported that the QL1-like population is globally found in various marine and lacustrine environments [31]. In particular, a recent study revealed the dominance of the QL1-like population (>95% genomic relatedness) in the Caspian Sea [40], in addition to other brackish waters globally distributed (e.g., the Baltic Sea, Chesapeake Bay, Delaware Bay, Lake Nam Co, and Lake Qinghai) [23,31,41,42]. Metagenomics uncovered distinct features in phylogenetic diversity and gene content between freshwater and oceanic communities [43]. Taking the finding one step further, the present and other studies [3,5] collectively support the hypothesis that brackish waters exert strong selective pressure shaping community assembly, including selective enrichment of core populations (e.g., QL1) that are frequently occurring and highly abundant.
In the present study, our metagenomic analysis (e.g., using the hypervariable region of SSU rRNA gene and many single-copy housekeeping genes) revealed a SAR11 species population in the Baltic Sea. This study provided the atlas (i.e., brackish waters) of the population noticeably conserved over the Baltic Sea surface waters and revealed that the population was the key driver causing the seasonal shifts in the SAR11 community structure at higher taxonomic levels (e.g., the total SAR11 clade and the subclade IIIa). The results of this study thus can help researchers focus on the core member persistent in the given habitat, which is thought to be tightly linked to the resilience and function of the ecosystem, rather than temporary and instable members. However, we still lack the understanding of how the population ecologically sustain the ecosystem functioning and health, therefore suggesting further studies particularly focusing on the gene content/expression and ecological role of the core community. For example, what metabolism genes/proteins constitute the functions of the core community and how much essential the degree/redundancy of the functions is for maintaining the ecosystem functioning and health and recovering from external perturbations such as local and global environmental stressors. Although cultivating SAR11 bacteria in laboratory settings were thought to be extremely difficult, the successful cultivation of SAR11 bacteria using a distinction-to-extinction method has emerged [39]. We therefore suggest future isolation of more representative SAR11 ecotypes persistent in the brackish waters and their genomic, transcriptomic, and phenotypic characterizations in relation to their ecological attributes. Ecological insight into the core community can be enriched by multi-omics techniques that can assess in situ activities, which will help expand our understanding beyond the taxonomically characterized identity to the ecosystem functioning/resilience. The genomic, transcriptomic, and phenotypic findings will also advance the current understanding of the distinct evolutionary path (i.e., ecological specialization of the brackish condition) undertaken by the brackish bacteria and allow a more systematic assessment of their roles in the local and global brackish ecosystems.