Next Article in Journal
Growth and Nutrient Uptake of Palmaria palmata in Small-Batch Cultures with Effluent Water from a Commercial Salmo salar Recirculating Aquaculture System
Previous Article in Journal
Seaweed Fermentation: Advances in Biomass Processing and Bioactive Potential
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparative Genomic Analysis of Cosmopolitan Dominant Cyanobacteria Microcoleus vaginatus and Microcystis aeruginosa

1
CAS Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
2
College of Advanced Agricultural Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
3
Modern Seed Industry College, Henan Vocational College of Agriculture, Zhengzhou 451450, China
*
Authors to whom correspondence should be addressed.
Phycology 2026, 6(2), 64; https://doi.org/10.3390/phycology6020064
Submission received: 20 April 2026 / Revised: 27 May 2026 / Accepted: 4 June 2026 / Published: 7 June 2026

Abstract

Cyanobacteria inhabit ecosystems ranging from oligotrophic deserts to eutrophic lakes, yet it remains unclear whether distantly related species dominate in disparate habitats, share common genomic features, or show divergent specialization. Here, we established a comparative framework of Microcoleus vaginatus, the pioneer stabilizer of biocrusts, and Microcystis aeruginosa, a major cause of freshwater blooms worldwide. Our dataset comprises 504 high-quality cyanobacterial genomes, including 132 M. vaginatus, 148 M. aeruginosa, and 224 reference taxa, for analyses of genome architecture, functional repertoires, and genomic plasticity. Both focal lineages showed signatures of extensive horizontal gene transfer and shared a small set of conserved orthologous groups, annotated as FAD-dependent oxidoreductases, manganese efflux, and class II aldolases. Nevertheless, the two lineages followed distinct genomic strategies. M. vaginatus expands regulatory breadth and stress-resilience gene families, whereas M. aeruginosa shows evidence of genome streamlining and rapid nutrient exploitation. Notably, we hypothesize that aquatic M. vaginatus strains retain ancestral terrestrial genomic features while gradually acquiring potential aquatic-specific adaptations. Together, these results reveal a two-tier architecture associated with cyanobacterial dominance and provide a testable hypothesis for how cyanobacterial lineages may respond to global change pressures.

1. Introduction

Cyanobacteria are among the oldest oxygenic photoautotrophs on Earth and occupy nearly every light-bearing habitat, from open oceans and freshwater systems to desert soils and polar rocks [1,2,3]. Through oxygenic photosynthesis and biogeochemical cycling of C, N, and S, they continue to shape the atmosphere and the modern biosphere [4,5]. Despite this pervasive success, the genomic logic associated with the dominance of distantly related cyanobacteria in strikingly different niches remains an unresolved question in microbial evolution.
Nonetheless, comparing Microcoleus vaginatus and Microcystis aeruginosa might provide insights into addressing this question (hereafter referred to as M.v and M.a, respectively). M.v is the canonical pioneer and stabilizer of biological soil crusts (known as Earth’s living skin [6]), playing a crucial role in underpinning soil multifunctionality and dryland restoration [7,8]. M.a is a globally invasive, harmful-bloom-forming species that periodically overwhelms eutrophic freshwaters and coastal transition zones, impairing water quality, the integrity of the aquatic food web, and public health [9,10,11,12]. The two species diverged more than 1 billion years ago and occupy distinct habitats [13]. However, both achieve monopolistic dominance within their realms [10,14,15,16]. Whether this parallel triumph reflects shared genomic features, lineage-specific specialization, or a combination of both is not known [17,18].
Ecological success is ultimately encoded at the genomic level, where architecture, gene content, and plasticity jointly determine metabolic flexibility, stress tolerance, and competitive capacity [3,19,20]. Transitions between terrestrial and aquatic lifestyles, in particular, typically demand substantial genomic reorganization [21,22]. These life-history shifts are sculpted by positive selection, homologous recombination, and gene gain/loss, which together shape pan-genome plasticity and niche specialization [23,24,25].
Previous work has identified several traits that may contribute to single-lineage dominance, including EPS-mediated filament bundling, surface motility, specialized signal transduction, and phycosphere-based C/N exchange in M.v [26,27,28], and colony formation, gas-vesicle buoyancy, toxin production, antiviral systems, and efficient nutrient storage in M.a [2,29,30]. A comparative analysis of the two species might help clarify which features are lineage-specific adaptations and which belong to a broader cyanobacterial dominance toolkit. Moreover, although genome streamlining is frequently invoked for aquatic specialists [31,32], many terrestrial cyanobacteria retain the capacity to thrive in liquid culture, and M.v strains are increasingly reported from aquatic habitats [7,33]. It remains unclear whether transitions between terrestrial and aquatic habitats reshape genomic plasticity and functional repertoires while retaining ancestral traits, and whether such transitions pass through identifiable genomic intermediates.
To address these gaps, we compiled a comparative dataset comprising 504 high-quality cyanobacterial genomes, including both terrestrial and aquatic ecotypes of bundle-forming M.v (n = 132) and bloom-forming M.a (n = 148), along with 224 additional cyanobacterial reference genomes. By integrating analyses of genome architecture, functional gene repertoires, genomic plasticity, and lineage-specific molecular evolution, complemented by metabolic modeling, we aimed to characterize the genomic divergence between the focal lineages and to provide a predictive basis for future experimental studies on how these lineages may respond to global-change pressures such as desertification and eutrophication.

2. Materials and Methods

2.1. Data Collection and Quality Control

We constructed a comprehensive dataset comprising 280 high-quality genomes, including 132 M.v genomes representing both terrestrial and aquatic ecotypes, and 148 M.a genomes from aquatic environments. Of these, we sequenced 57 M.v genomes (Table S1; BioSample SAMN38845278–SAMN38845336, marked as ‘Sequenced’ in the Source column), and the remaining genomes were downloaded from NCBI (marked as ‘Downloaded’). Genomes assigned to M.v were selected by the diagnostic 11 bp insertion in the 16S rRNA gene [14]. Although many genomes bearing the diagnostic sequence are currently deposited as ‘Microcoleus sp.’ in public databases, we refer to them as M.v due to a lack of comprehensive taxonomic information and for consistency with prior ecological literature, while not aiming to resolve species boundaries here. However, some of them may be taxonomically reassigned in future revisions. Additionally, the genome of strain M.a CS-567/02-A1 was filtered out due to its low similarity to other M.a genomes. We assessed each genome using CheckM v1.0.18 to meet stringent quality standards, with ≥95% completeness and ≤5% contamination [34,35]. To ensure the evolutionary coherence of each focal species, we verified that both M.v and M.a populations maintain robust intra-species gene flow (Figure S1). The homoplasy-to-nonhomoplasy ratio (h/m) across increasing numbers of genome samples was calculated using ConSpeciFix v1.3.0 to quantify the extent of gene flow [36]. In addition, we retrieved 245 cyanobacterial genomes marked as ‘reference’ in NCBI, of which 224 strains with >95% completeness were selected for our comparative analysis (Table S1). Reference cyanobacterial genomes were included to provide a phylum-wide background for evaluating whether features observed in M.v and M.a are relatively exceptional or broadly conserved. However, we acknowledge that phylum-wide comparisons of genomic features (such as genome size) may be confounded by phylogenetic structure; therefore, we additionally performed order-restricted comparisons using Oscillatoriales (n = 33) and Chroococcales (n = 24) subsets to reduce phylogenetic noise.

2.2. Genome Annotation and Architectural Characterization

Protein-coding genes were predicted across all genome assemblies using Prodigal v2.6.3 [37]. Genome size, GC content, and gene density were calculated using a custom R script (genome_characteristic_boxplot.R, available in the GitHub repository mentioned in the data availability statement, hereinafter the same). The GC content at the third codon position (GC3s) and the codon adaptation index (CAI) were determined using CodonW v1.3 (Peden, http://codonw.sourceforge.net/, accessed on 1 June 2021). Before the formal analysis of genomic datasets, all ribosomal protein genes annotated by Prokka were extracted for a preliminary run to generate the cai.coa file required for subsequent CAI computation. We predicted tRNA genes using tRNAscan-SE v2.0.5 [38]. Repetitive sequences and small RNAs were identified via RepeatMasker v4.0.8 (http://www.repeatmasker.org, accessed on 15 May 2023) and Infernal v1.1.5 against the Rfam database [39], respectively. For functional classification, all protein-coding sequences were annotated against the EggNOG database using eggNOG-mapper v2.1.13 to assign clusters of orthologous groups (COG) categories [40]. Genome-scale metabolic models were reconstructed using CarveMe v1.6.1 [41], and specific metabolic pathways were further extracted through COBRApy v0.30.0 [42].

2.3. Genomic Plasticity and Molecular Evolution Analysis

Orthologous group (OG) clustering was performed using OrthoFinder v2.2.7 [43], which employs the DIAMOND algorithm for sequence alignment. Based on the result file Orthogroups.GeneCount.csv, which contains gene counts per sample for each OG, we performed 300 permutations for each OG to calculate pan- and core-summary statistics. The pan-genome size was fitted to Heaps’ law, and core genome decay was modeled using an exponential decay function [44].
Putative horizontal gene transfer (HGT) regions were predicted using Alien Hunter v1.7, an application that calculates HGT boundaries using interpolated variable-order motifs [45]. Insertion sequences (IS) and prophage regions were identified by ISEScan v1.7.3 and Phispy v3.7.8 [46,47]. For defense systems, the restriction-modification (RM) system was screened against REBASE using DIAMOND v2.0.8.146 [48,49], and we used a custom Python v3.8.10 script (RM_system_stas.py) to identify complete RM systems. Type I systems required all three core subunits (RE, MT, S); Type II and III systems required both RE and MT; Type IIG and Type IV systems were counted directly as functional units. The total number of complete RM systems was summed across all types. CRISPRCasFinder v4.2.20 was used to find the CRISPR-Cas genetic architecture in the genomes [50]. We curated the natural competence gene set based on previous studies about natural transformation in Gram-negative bacteria and filamentous cyanobacteria [51,52].
The protein sequences of core OGs present across all strains in each group were aligned by MAFFT v7.453 using the L-INS-I strategy (--localpair --maxiterate = 1000) for high-accuracy alignments [53], and back-translated to codon alignments based on corresponding nucleotide sequences. We used KaKs_Calculator v2.0 to calculate the pairwise non-synonymous substitution rate (Ka), synonymous substitution rate (Ks), and Ka/Ks for core homologous genes using the YN algorithm [54]. The substitution rates of paralogous genes were calculated by wgd v1 [55].

2.4. Lineage-Specific Functional Divergence and Evolution

A custom R script (conserved_elements_venn.R) was used to identify conserved and lineage-specific elements. In this script, a count file of orthologous genes or metabolic reactions was used as input, and the data were binarized into presence-absence matrices. Conserved elements were those with ≥95% prevalence within each target group. Elements present in >20% of the reference group were defined as widespread in Cyanobacteria and removed to focus on lineage-enriched signals; we note that this filtering can exclude some genuine core metabolic functions and should not be interpreted as their absence. A Venn diagram was employed to distinguish lineage-specific elements and shared elements. Subsequently, lineage-specific and shared OGs were selected as target OGs for comprehensive functional and evolutionary characterization.
For function, we extracted COG annotations of all lineage-specific and shared OGs. The top 12 dominant COG terms were retained, and low-frequency categories were combined into the Other category. Reciprocal searches and domain profiling via eggNOG-mapper result files were used to distinguish conserved OGs from their potential isozymes. Protein motifs were analyzed using MEME suite v5.5.9 [56]. For evolution, corresponding gene coordinates were retrieved from GFF files annotated by Prokka, and 10 kb flanking regions of target genes were generated using bedtools v2.27.1 [57]. A trusted HGT event was defined by genes overlapping with HGT regions identified by Alien Hunter and accompanied by either Phispy-identified prophages or IS elements in the 10 kb flanking sequences. Nucleotide sequences of genes from each specific/shared OG were extracted, and gene pairs within each group were analyzed by KaKs_Calculator v2.0 to obtain median Ka/Ks. For shared OG, median Ka/Ks values were evaluated within each group and averaged.

2.5. Statistical Analysis and Visualization

All visualizations were generated using R v4.2.2, which included the ggplot2, ggridges, gghalves, VennDiagram, and corrplot packages. Correlations between genomic features and evolutionary indices were calculated using Spearman’s rank correlation. Statistical significance between groups was assessed using the Wilcoxon rank-sum test.

3. Results

3.1. Divergence in Genomic Architecture and Functional Allocation

Comparative genomic analysis revealed pronounced architectural divergence between the two species, while the M.v ecotypes exhibited minimal differentiation (Figure 1A,B). Both M.v ecotypes maintained similar genome sizes (terrestrial strains: 7.36 ± 0.64 Mb; aquatic strains: 7.24 ± 0.53 Mb; Wilcoxon test, p > 0.05), which were significantly larger than those of M.a (4.86 ± 0.40 Mb; p < 0.001). In the constrained phylogenetic framework (Figure S2), M.v falls within the upper range of Oscillatoriales, whereas M.a shows no statistically significant divergence from Chroococcales. Despite this, M.v exhibits lower gene density (850 genes/Mb) than M.a (949 genes/Mb), suggesting a more complex genomic architecture of M.v that may accommodate additional regulatory elements or mobile genetic content. Furthermore, M.v tends to have higher repeat proportions than M.a, with the terrestrial strains of M.v peaking at the highest repeat ratios, and the aquatic strains showing similarly elevated repeat ratios (Figure S3). The total sRNA hit distribution of M.v showed a parallel trend, with aquatic strains at higher counts, followed by terrestrial strains, whereas M.a showed a skewed distribution toward lower counts (Figure S3).
GC content patterns were consistent with genome size (Figure 1B). The terrestrial and aquatic M.v ecotypes displayed significantly higher GC content (45.83 ± 0.20% and 45.67 ± 0.11%, respectively) than M.a (42.64 ± 0.28%, p < 0.001). This disparity was particularly prominent at GC3s, where M.v maintained consistently elevated levels (∼0.43), contrasting sharply with M.a (0.37 ± 0.01). For translation metrics, M.v exhibited a lower CAI than M.a (p < 0.001). Moreover, M.v ecotypes harbored elevated tRNA numbers (median: 76), far exceeding M.a (median: 41) and the upper quartile of the reference genomes.
Functional annotation revealed distinct patterns of gene allocation (Figure 1C). M.v genomes were enriched in transcriptional regulation (K), signal transduction (T), and carbohydrate metabolism (G). In contrast, M.a displayed a pronounced bias toward translation (J), cell cycle control (D), energy production and conversion (C), inorganic ion transport (P), and defense-related functions (V). Both M.v (terrestrial and aquatic strains) and M.a lineages maintained significantly higher proportions of L-class genes (replication, recombination, and repair) than reference genomes. Correlation analysis demonstrated that the relative abundance of L-class genes was positively associated with the proportion of IS elements (p < 0.001, Figure S4). Genome-scale metabolic model reconstruction further delineated distinct metabolic boundaries (Table S2). M.v lineages possessed metabolic features such as biotin biosynthesis, nitrite reductases [NAD(P)H], amide hydrolysis, and specific stress response signaling coupled with lipid and peptidoglycan remodeling capabilities. Aquatic M.v strains additionally possessed pathways for organic nitrogen assimilation and aromatic metabolism, including urea hydrolysis and tryptamine synthesis, which might be associated with the habitat transition. M.a possessed high-affinity potassium transporter, methionine salvage cycles, histidine biosynthesis, energy storage pathways, reactive nitrogen detoxification, and osmoadaptation mechanisms.

3.2. Genomic Plasticity and Maintenance Strategies

Pan-genome accumulation curves revealed that all three lineages maintain open pan-genome structures (Figure S5), as indicated by power-law parameters (γ) ranging from 0.237 to 0.269, reflecting ongoing horizontal gene acquisition and loss. They maintained substantially smaller core genomes relative to pan-genome sizes, with M.a exhibiting the most streamlined core genome. The open pan-genome structures necessitate corresponding genomic plasticity mechanisms to integrate incoming genetic material, a context in which the two focal species have evolved distinct genome management strategies. The proportion of putative horizontal gene transfer (HGT) regions was markedly higher in M.v and M.a than in reference genomes (Figure 2A), indicating pervasive horizontal acquisition. While the genome fraction occupied by IS remained broadly comparable across the three lineages, the composition-level diversity of IS differed, as reflected in IS family counts (Figure S6). In contrast, significant enrichment of prophages was observed in the M.a genomes.
To further assess DNA uptake ability as an additional route supporting HGT, we screened genomes for a competence-associated gene set (Table S3). Core components of the pilus/DNA uptake machinery were widely conserved across lineages, indicating broad retention of the basic transformation apparatus. Notably, the minor pilin fimT, the pilus assembly factor pilO, and the membrane-associated DNA-binding receptor comEA were generally present in M.v but nearly absent from M.a. In contrast, the pilin pilX was nearly restricted to M.a, but undetectable in M.v; and the secretin pilQ was absent in M.v but present in 32% of M.a. These genes were present in part of the reference genomes, ranging from 19% to 75%.
Defense systems modulate HGT rates and constitute crucial regulatory factors of genomic plasticity. M.v displayed heterogeneous CRISPR spacer distributions yet minimal investment in RM systems. Detailed system analysis revealed that Type II RM systems were most abundant across all three lineages. However, M.a showed significant enrichment for complete Type I and Type IIG systems, whereas M.v maintained a greater number of Type IV systems (Figure S7). Both aquatic lineages showed higher restriction enzyme abundance than terrestrial M.v (p < 0.01). Notably, aquatic M.v occupied intermediate positions across these metrics. The retention of elevated repetitive content, combined with increasing prophage integration, suggests a hypothesis that aquatic M.v could be progressively remodeling its genome to match aquatic selective pressures while retaining terrestrial adaptive features.

3.3. Lineage-specific Divergence and Conservation of Gene Repertoires

Nucleotide-substitution rate analysis of core orthologs further supported the evolution of ecological adaptation. Median Ks values declined progressively from terrestrial M.v (0.120) through aquatic M.v (0.103) to M.a (0.083) (Figure S8A). While Ka/Ks ratios indicated predominant purifying selection (Figure 2C), M.v maintained significantly higher median Ka/Ks values. Paralogous pairs showed lower Ks but higher proportions under positive selection in M.v ecotypes than in M.a and the references (Figure S8B,C).
To identify lineage-specific adaptation in functional repertoires, we screened OG clusters that were either unique to or shared among lineages. After filtering OGs present in >20% of reference genomes to exclude universally conserved functions, M.a possessed 343 exclusive OGs, substantially exceeding those of individual M.v ecotypes (Figure 3A). By contrast, both terrestrial and aquatic ecotypes of M.v shared 584 OGs that associated with the ecological success of Microcoleus across terrestrial and aquatic systems. Functional distribution analysis revealed that the massive shared gene pool of M.v showed primary investment in signal transduction (T) and cell wall/membrane biogenesis (M) (Figure 3B). Exclusive OGs of M.a were predominantly unknown or unmapped. A gene annotated as the cell envelope-related transcriptional attenuator was shared exclusively by aquatic M.v and M.a, despite being detected in most terrestrial M.v genomes (n = 81) with a lower than 95% conserved threshold (Table S4). Three OGs shared across all lineages further suggest functions that may be associated with the widespread abundance and persistence of cyanobacteria. They were annotated as a FAD-dependent oxidoreductase, a manganese efflux pump (mntP), and a class II aldolase with an adducin N-terminal domain, respectively.
Given the possibility that distinct gene families may annotate to identical functions, we performed reciprocal searches on these conserved functional OGs and identified potential isozyme gene families (Table S4). Multiple OGs encoding FAD-dependent oxidoreductases are widely conserved across cyanobacteria. Domain profiling revealed that the homolog conserved in the M.v and M.a lineages possesses the FAD_binding_3 and Trp_halogenase domains (Figure 4). Another OG characterized by these domains was only present in two M.a strains. For the manganese efflux pump, additional OGs identified by the search were detected in only a limited subset of reference genomes. Regarding the class II aldolase, one additional OG was found to be conserved in M.v but absent in M.a, while another OG, missing in both lineages, was present in some reference genomes. All three aldolase OGs harbor the aldolase II catalytic domain. However, motif analysis revealed substantial structural variation. OG0002425 comprises a compact, conserved catalytic core of relatively small size. While preserving similar catalytic segments, OG0002653 incorporates an additional alkaline-enriched motif8 alongside an aromatic and hydrophobic motif1, and OG0010356 is characterized by a significant C-terminal extension (Figure S9).

3.4. Relationship Between Evolutionary Process and Lineage-specific Genes

To further elucidate lineage-adaptive strategies, we analyzed the evolutionary drivers of lineage-specific gene families, including HGT, gene duplication, and selection pressure. Correlation analysis revealed significant synergistic relationships among genomic processes (Figure S10). The positive correlation among HGT, multicopy expansion, and Ks values (p < 0.001) suggests that lineage-specific OGs with stronger mobility-associated signals tend to be present as multiple copies and exhibit deeper synonymous divergence, consistent with longer residence times and subsequent within-lineage diversification. However, this pattern may also be influenced by detection biases and the statistical structure of pairwise Ks estimates. The strong positive correlation between median Ka/Ks and the Ka/Ks > 1 ratio (r = 0.57, p < 0.001) supported the use of median Ka/Ks as a summary metric consistent with adaptive evolutionary intensity.
Hotspot screening based on HGT ratio (>0.5) or Ka/Ks > 1 revealed distinct evolutionary trajectories of M.v and M.a, with genes bearing minimal reference genome counts highlighting candidate gene families potentially involved in lineage-associated specialization (Figure 5). M.v showed a higher proportion of lineage-specific genes with elevated Ka/Ks. Among these genes, only a few had informative functional annotations. For example, OG0006890 was detected to show a high median Ka/Ks in aquatic M.v (Ref = 1) and annotated as arginyl-tRNA synthetase (argS). Another annotated hotspot was an IS605 OrfB family transposase (Ref = 3), which was shared by both M.v ecotypes. Beyond these positive selection candidates, horizontal acquisition might further expand the M.v adaptive repertoire. Terrestrial and aquatic M.v appeared to maintain horizontally acquired elements annotated as stress-response related functions, including a gene encoding a bacterial stress protein and grpE, which participates in the response to hyperosmotic and heat shock by preventing the aggregation of stress-denatured proteins. Alternatively, HGT-mediated acquisition of specialized metabolic modules appeared to be a more prevalent adaptive strategy in M.a compared to positive selection. Notably, both aquatic lineages might independently acquire distinct orthologous groups annotated as clan AA aspartic protease via horizontal transfer. HGT hotspot genes showed substantially higher multicopy representation and broader distribution across reference genomes than positive selection hotspots, suggesting complementary adaptive strategies in our dataset.

4. Discussion

By comparing the genomes of M.v and M.a, this study identified a set of conserved and lineage-specific genomic features that may help explain why these two cyanobacterial species are among the most abundant and widespread members of their respective habitats. While acknowledging that both focal species complexes have substantial cryptic diversity [7,58], the main goal of this study is not to clarify their taxonomic boundaries. Rather, we focus on the genomic basis driving the ecological success of this important and dominant group of cyanobacteria.

4.1. Genomic Plasticity and Defense System Trade-offs

Our comparative analysis reveals that both M.v and M.a lineages maintain open pan-genomes and exhibit elevated HGT ratios compared to reference strains, consistent with the important role of DNA acquisition in microbial adaptation [59]. This high level of gene flow is likely supported by an enriched repertoire of repair-associated genes, which serve to mitigate the mutational and structural burden imposed by active exogenous DNA integration [60,61].
However, the mechanisms by which these two lineages acquire and manage exogenous DNA appear to differ. M.v retains a comparatively complete natural competence toolkit (Table S3). The universal presence of the minor pilin fimT, the inner-membrane assembly factor pilO, and the periplasmic receptor comEA may contribute to the capacity for efficient DNA uptake and translocation [52,62]. Instead, M.a uniquely retains pilX, a factor promoting pilin polymerization that is often linked to biofilm and colony formation [63], suggesting a possible repurposing of its pilus machinery in support of a colonial lifestyle. The high prophage content, which provides superinfection exclusion and auxiliary metabolic genes [64,65], further suggests a contribution of phage-mediated transduction to HGT in this lineage. This shift is accompanied by distinct defense investment. M.v shows greater representation of CRISPR-Cas components associated with long-term immune memory against intermittent soil threats. In contrast, M.a prioritizes constitutive barrier defenses, such as RM systems and prophage-mediated exclusion, which might be consistent with sustained viral pressure in eutrophic waters despite its diverse CRISPR-Cas systems [30,66,67]. Notably, the observed genomic patterns may not solely reflect contemporary adaptive responses. Phylogenetic inertia likely plays a significant role, as the core repertoire of HGT and defense systems may be partially inherited from their respective common ancestors. Distinguishing between such historical legacies and active ecological adaptations remains a challenge, as genomic architectures are shaped by both ancestral baggage and adaptation.

4.2. Divergent Genomic Investment upon Conserved Physiological Requirements

The small conserved gene set may represent shared physiological requirements across habitats. It is worth noting that the basal metabolic genes essential for microbial survival were manually excluded in this study. It does not indicate that they are any less important than our detected conserved genes, but only that they did not show enrichment in the lineages under investigation. The conservation of a putative FAD-dependent oxidoreductase is consistent with a common requirement for redox homeostasis in oxygenic phototrophs, where photosynthetic electron transport inevitably generates reactive oxygen species under fluctuating light and nutrient conditions [68]. Likewise, the mntP gene regulates manganese homeostasis and may contribute to tolerance of oxidative stress and maintenance of photosystem II function by limiting intracellular metal toxicity and supporting manganese availability for water splitting [69,70,71]. The class II aldolase with the N-terminal domain of adducin was widely conserved across multicellular life [72]. Its involvement in eukaryotic cytoskeleton assembly suggests potential functions in cell wall/membrane binding, population adhesion, and maintenance of cell morphology, which facilitate the formation of cell bundles and colonies [26,29,73]. Further experimental validation is therefore required. The conservation of these OGs, rather than their isozymes, might be driven by structural efficiency and functional breadth. For instance, OG0002425 aldolase possesses a compact catalytic structure, whereas its isozymes carry additional domains that may incur higher metabolic costs [74]. Meanwhile, the conserved FAD-dependent oxidoreductase contains a Trp_halogenase domain, which is associated with tryptophan halogenation and the biosynthesis of pyrrolnitrin, a broad-spectrum antifungal compound [75].
Upon this conserved core, M.v and M.a exhibit divergent genomic architectures and functional profiles, reflecting two distinct evolutionary trajectories [76]. M.v is characterized by greater genomic complexity and expanded regulatory potential, reflected in its larger, repeat-rich genomes with pronounced investment in signal transduction and transcriptional regulation. Combined with elevated GC content and an expanded tRNA pool, this architecture may support the metabolic flexibility required for rapid responses to microenvironmental pulses and stress [77]. It is consistent with responses to frequent desiccation-rehydration cycles and to high irradiance in biological soil crusts [18,27]. Genome-scale metabolic model reconstruction indicates that M.v lineages primarily allocate metabolic capacity to cofactor autonomy and redox/structural resilience, including biotin biosynthesis, NAD(P)H-dependent nitrite reduction, amide hydrolysis, and stress-response signaling coupled with lipid and peptidoglycan remodeling, consistent with selection for rapid physiological reconfiguration under brief nutrient pulses and recurrent envelope damage in soils (Table S2). Despite the high metabolic cost of an additional gene and elevated GC content [43,46], the expanded tRNA pool, coupled with a refined strategy of codon usage bias, decouples translation from stringent optimization and facilitates rapid cell growth [78,79]. Evolutionary analysis indicates that adaptation of M.v is primarily associated with intense positive selection on rare, lineage-specific genes and paralogous subfunctionalization [48], effectively fine-tuning the intrinsic cellular machinery to withstand localized stressors. Although tRNA synthetases form core components of the translation machinery, whose functional mutations are typically lethal, a recent study showed that mutation of the PheS aminoacyl tRNA synthetase increases bacterial tolerance to disinfectants [80]. Positive selection on the IS605 OrfB family transposase also establishes a direct link between genomic plasticity mechanisms and adaptive fine-tuning, indicating that evolvability itself is evolvable [81].
However, M.a may represent a specialized exploiter lineage in terms of genome organization and translation-biased functional allocation [82], although this architecture appears to be shaped by both phylogenetic constraints and environmental context (Figure S2). The enrichment in high-affinity potassium transporters, osmoadaptation, reactive nitrogen detoxification, and energy storage pathways, alongside methionine salvage cycles and histidine biosynthesis, may be compatible with persistence under fluctuating chemical conditions. Moreover, M.a shows evidence of frequent horizontal gene acquisition accompanied by multicopy expansion, consistent with its high diversity in the accessory genome [83]. The coupling between HGT and multicopy expansion has also been observed in Staphylococcus [84], suggesting that dosage amplification of horizontally acquired genes may serve as a broad mechanism for rapidly scaling metabolic capacity to exploit transient resource availability or reinforce competitive defenses.

4.3. A Hypothetical Framework for Aquatic Adaptation and Gradual Ecological Transition

Potential convergent retention was observed that a cell envelope-related transcriptional attenuator is shared exclusively by aquatic M.v and M.a. It is also present in most terrestrial M.v genomes, but at lower levels of conservation (Table S4). While the precise role remains to be experimentally verified, its predicted homology to an enzyme catalyzing the final step in cell wall teichoic acid biosynthesis [85] may suggest that common selective pressures on cell envelope integrity act on both aquatic lineages. Furthermore, both aquatic lineages might acquire distinct clan AA aspartic proteases via HGT independently, which are believed to contribute to adhesion and invasion of host tissues by degrading cell-surface structures [86].
Aquatic M.v appears to be a relatively recently evolved group, as it occupies a terminal position in the phylogenetic tree [7]. Given the absence of time-calibrated phylogenetic analysis and ancestral-state reconstruction, we propose the terrestrial-to-aquatic transition as a hypothesis rather than a definitive evolutionary pathway. Importantly, the genomic profile of aquatic M.v reveals that the putative ecological transition is not an immediate architectural overhaul. Its genome size, GC content, repeat proportion, and broad functional investment remain strongly anchored to the terrestrial state. However, its plasticity-regulatory networks, competence machinery, prophage load, and defense system allocation exhibit distinct intermediate characteristics that progressively shift toward an aquatic phenotype. This pattern aligns with evolutionary discordance among genes and the notion that ecological diversity exceeds evolutionary diversity [87,88]. It suggests a stepwise evolutionary model in which genomic architecture is conserved during initial colonization, serving as a stable, stress-resilient buffer. In contrast, genomic plasticity mechanisms and lineage-specific functional repertoires might be progressively rewired to match the new habitat. Aquatic M.v expanded its substrate spectrum by investing more in urea hydrolysis and tryptamine synthesis. The former is in line with previous studies, finding that the ureC gene is enriched in riparian zones [89], while the latter is recognized as an endogenous signaling molecule that coordinates the response of plants to environmental stress, which can improve the stress resistance of cyanobacteria by enhancing the antioxidant defense system [90].
Although genome quality filters were applied, differences in genome completeness, assembly contiguity, and sampling sources may affect estimates of gene repertoires. In addition, uneven representation of habitats and lineages may introduce sampling bias and phylogenetic constraint into comparative analyses. Many functional interpretations also rely on predicted annotations rather than direct experimental validation, which is strongly restricted by the quality of functional databases. Thus, the ecological and evolutionary interpretations proposed here should be regarded as hypotheses requiring further validation with more balanced sampling, improved genome assemblies, and complementary experimental data.

5. Conclusions

Our results provide a comparative genomic framework for understanding the divergence of two dominant cyanobacterial species, M.v and M.a, from different habitats. Genomic plasticity, conserved functional cores, and lineage-specific innovations together outline candidate features that may underlie the dominance of both species. The conserved mechanisms and terrestrial-to-aquatic transition hypothesis identified here provide candidate targets for future studies of cyanobacterial responses to environmental perturbations. Future work should prioritize testing the functional significance of lineage-specific genes under field conditions, elucidating the temporal dynamics of genomic transitions during colonization, and extending these principles to other cyanobacterial taxa to assess the generality of the stepwise transition model for microbial ecological success across environmental boundaries.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/phycology6020064/s1. The manuscript is accompanied by supplemental materials with 10 Figures and 4 Tables. Figure S1. Verification of intra-species gene flow and evolutionary coherence. Figure S2. Comparison of genomic features among Oscillatoriales, terrestrial M.v ecotypes, aquatic M.v ecotypes, M.a, and Chroococcales. Figure S3. Distribution of repetitive sequences and small RNAs across lineages. Figure S4. Spearman correlation matrix of genomic architecture, functional traits, and genomic plasticity markers. Figure S5. Pan-genome and core-genome fitting of focal cyanobacterial lineages. Figure S6. Diversity of IS families across lineages. Figure S7. Subtype distribution and abundance of RM systems. Figure S8. Evolutionary dynamics of homologous and paralogous gene pairs. Figure S9. Motif structure of class II aldolase. Figure S10. Correlation matrix of evolutionary metrics calculated based on lineage-specific and shared genes. Table S1. Genomic quality and accessions of strains utilized in this investigation. Table S2. Comparative profiling of lineage-specific and shared metabolic reaction potentials. Table S3. Distribution and presence ratio of natural competence genes across terrestrial M.v, aquatic M.v, M.a, and reference cyanobacterial genomes. Table S4. Isozyme distribution of core conserved genes.

Author Contributions

Conceptualization, J.W. and C.H.; methodology, J.W.; formal analysis, J.W.; investigation, J.W., X.G. and Y.W.; writing—original draft preparation, J.W.; writing—review and editing, J.W. and H.L.; visualization, J.W. and H.L.; supervision, H.L. and C.H.; funding acquisition, H.L. and C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (32370125, 32571881, and 32430005), the Natural Science Foundation for Distinguished Young Scholars of Hubei Province (2022CFA105).

Data Availability Statement

The genomes reported in this study are publicly available in the NCBI database as described in Table S1. The scripts supporting the findings in this study are deposited in GitHub (https://github.com/rosemed/comparative-genomics-Mv-Ma).

Acknowledgments

We are grateful for the technical support provided by the Freshwater Algae Culture Collection at the Institute of Hydrobiology (FACHB) and the Analysis and Testing Center of IHB. The Supercomputing Center of CAS, Wuhan Branch, assists with the sequencing and statistical analyses.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Sanchez-Baracaldo, P.; Hayes, P.K.; Blank, C.E. Morphological and habitat evolution in the Cyanobacteria using a compartmentalization approach. Geobiology 2005, 3, 145–165. [Google Scholar] [CrossRef]
  2. Whitton, B.; Potts, M. The Ecology of Cyanobacteria. Their Diversity in Time and Space; Springer: Dordrecht, The Netherlands, 2000. [Google Scholar]
  3. Chen, M.Y.; Teng, W.K.; Zhao, L.; Hu, C.X.; Zhou, Y.K.; Han, B.P.; Song, L.R.; Shu, W.S. Comparative genomics reveals insights into cyanobacterial evolution and habitat adaptation. ISME J. 2021, 15, 211–227. [Google Scholar] [CrossRef] [PubMed]
  4. Garcia-Pichel, F.; Belnap, J.; Neuer, S.; Schanz, F.J.A.S. Estimates of global cyanobacterial biomass and its distribution. Algol. Stud. 2003, 109, 213–227. [Google Scholar] [CrossRef]
  5. Zehr, J.; Bench, S.; Carter, B.; Hewson, I.; Niazi, F.; Shi, T.; Tripp, H.; Affourtit, J. Globally Distributed Uncultivated Oceanic N2-Fixing Cyanobacteria Lack Oxygenic Photosystem II. Science 2008, 322, 1110–1112. [Google Scholar] [CrossRef]
  6. Bowker, M.A.; Maestre, F.T.; Eldridge, D.; Belnap, J.; Castillo-Monroy, A.; Escolar, C.; Soliveres, S. Biological soil crusts (biocrusts) as a model system in community, landscape and ecosystem ecology. Biodivers. Conserv. 2014, 23, 1619–1637. [Google Scholar] [CrossRef]
  7. Stanojkovic, A.; Skoupy, S.; Johannesson, H.; Dvorak, P. The global speciation continuum of the cyanobacterium Microcoleus. Nat. Commun. 2024, 15, 2122. [Google Scholar] [CrossRef]
  8. Li, H.; Huo, D.; Wang, W.; Chen, Y.; Cheng, X.; Yu, G.; Li, R. Multifunctionality of biocrusts is positively predicted by network topologies consistent with interspecies facilitation. Mol. Ecol. 2020, 29, 1560–1573. [Google Scholar] [CrossRef]
  9. Harke, M.J.; Steffen, M.M.; Gobler, C.J.; Otten, T.G.; Wilhelm, S.W.; Wood, S.A.; Paerl, H.W. A review of the global ecology, genomics, and biogeography of the toxic cyanobacterium, Microcystis spp. Harmful Algae 2016, 54, 4–20. [Google Scholar] [CrossRef]
  10. Huo, D.; Gan, N.; Geng, R.; Cao, Q.; Song, L.; Yu, G.; Li, R. Cyanobacterial blooms in China: Diversity, distribution, and cyanotoxins. Harmful Algae 2021, 109, 102106. [Google Scholar] [CrossRef]
  11. Lakshmikandan, M.; Li, M.; Pan, B. Cyanobacterial Blooms in Environmental Water: Causes and Solutions. Curr. Pollut. Rep. 2024, 10, 606–627. [Google Scholar] [CrossRef]
  12. Tatters, A.; Howard, M.; Nagoda, C.; Busse, L.; Gellene, A.; Caron, D. Multiple Stressors at the Land-Sea Interface: Cyanotoxins at the Land-Sea Interface in the Southern California Bight. Toxins 2017, 9, 95. [Google Scholar] [CrossRef] [PubMed]
  13. Shih, P.M.; Wu, D.; Latifi, A.; Axen, S.D.; Fewer, D.P.; Talla, E.; Calteau, A.; Cai, F.; Tandeau de Marsac, N.; Rippka, R.; et al. Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing. Proc. Natl. Acad. Sci. USA 2013, 110, 1053–1058. [Google Scholar] [CrossRef] [PubMed]
  14. Garcia-Pichel, F.; Lopez-Cortes, A.; Nubel, U. Phylogenetic and morphological diversity of cyanobacteria in soil desert crusts from the Colorado Plateau. Appl. Environ. Microbiol. 2001, 67, 1902–1910. [Google Scholar] [CrossRef]
  15. Hu, C.; Gao, K.; Whitton, B.A. Semi-arid regions and deserts. In Ecology of Cyanobacteria II: Their Diversity in Space and Time; Whitton, B.A., Ed.; Springer: Dordrecht, The Netherlands, 2012; Chapter 12; pp. 345–369. [Google Scholar]
  16. Moreira, C.F.V.; Giraldo-Silva, A.; Roush, D.; Garcia-Pichel, F. Coleofasciculaceae, a monophyletic home for the Microcoleus steenstrupii complex and other desiccation-tolerant filamentous cyanobacteria. J. Phycol. 2021, 57, 1563–1579. [Google Scholar] [CrossRef]
  17. Yamamichi, M. How does genetic architecture affect eco-evolutionary dynamics? A theoretical perspective. Phil. Trans. R. Soc. B 2022, 377, 20200504. [Google Scholar] [CrossRef]
  18. Murik, O.; Oren, N.; Shotland, Y.; Raanan, H.; Treves, H.; Kedem, I.; Keren, N.; Hagemann, M.; Pade, N.; Kaplan, A. What distinguishes cyanobacteria able to revive after desiccation from those that cannot: The genome aspect. Environ. Microbiol. 2017, 19, 535–550. [Google Scholar] [CrossRef]
  19. Chrismas, N.A.M.; Anesio, A.M.; Sanchez-Baracaldo, P. The future of genomics in polar and alpine cyanobacteria. FEMS Microbiol. Ecol. 2018, 94, fiy032. [Google Scholar] [CrossRef]
  20. Li, C.; Liao, H.; Xu, L.; Wang, C.; He, N.; Wang, J.; Li, X. The adjustment of life history strategies drives the ecological adaptations of soil microbiota to aridity. Mol. Ecol. 2022, 31, 2920–2934. [Google Scholar] [CrossRef]
  21. Muraille, E. Diversity Generator Mechanisms Are Essential Components of Biological Systems: The Two Queen Hypothesis. Front. Microbiol. 2018, 9, 223. [Google Scholar] [CrossRef] [PubMed]
  22. Sriswasdi, S.; Yang, C.C.; Iwasaki, W. Generalist species drive microbial dispersion and evolution. Nat. Commun. 2017, 8, 1162. [Google Scholar] [CrossRef]
  23. Ellegren, H.; Galtier, N. Determinants of genetic diversity. Nat. Rev. Genet. 2016, 17, 422–433. [Google Scholar] [CrossRef] [PubMed]
  24. Chu, X.; Li, S.; Wang, S.; Luo, D.; Luo, H. Gene loss through pseudogenization contributes to the ecological diversification of a generalist Roseobacter lineage. ISME J. 2021, 15, 489–502. [Google Scholar] [CrossRef] [PubMed]
  25. Wheatley, R.M.; MacLean, R.C. CRISPR-Cas systems restrict horizontal gene transfer in Pseudomonas aeruginosa. ISME J. 2021, 15, 1420–1433. [Google Scholar] [CrossRef]
  26. Garcia-Pichel, F.; Wojciechowski, M. The Evolution of a Capacity to Build Supra-Cellular Ropes Enabled Filamentous Cyanobacteria to Colonize Highly Erodible Substrates. PLoS ONE 2009, 4, e7801. [Google Scholar] [CrossRef]
  27. Rajeev, L.; da Rocha, U.N.; Klitgord, N.; Luning, E.G.; Fortney, J.; Axen, S.D.; Shih, P.M.; Bouskill, N.J.; Bowen, B.P.; Kerfeld, C.A.; et al. Dynamic cyanobacterial response to hydration and dehydration in a desert biological soil crust. ISME J. 2013, 7, 2178–2191. [Google Scholar] [CrossRef]
  28. Couradeau, E.; Giraldo-Silva, A.; De Martini, F.; Garcia-Pichel, F. Spatial segregation of the biological soil crust microbiome around its foundational cyanobacterium, Microcoleus vaginatus, and the formation of a nitrogen-fixing cyanosphere. Microbiome 2019, 7, 55. [Google Scholar] [CrossRef]
  29. Xiao, M.; Li, M.; Reynolds, C. Colony formation in the cyanobacterium Microcystis. Biol. Rev. 2018, 93, 1399–1420. [Google Scholar] [CrossRef]
  30. Yang, C.; Lin, F.; Li, Q.; Li, T.; Zhao, J. Comparative genomics reveals diversified CRISPR-Cas systems of globally distributed Microcystis aeruginosa, a freshwater bloom-forming cyanobacterium. Front. Microbiol. 2015, 6, 394. [Google Scholar] [CrossRef]
  31. Jackrel, S.L.; White, J.D.; Evans, J.T.; Buffin, K.; Hayden, K.; Sarnelle, O.; Denef, V.J. Genome evolution and host-microbiome shifts correspond with intraspecific niche divergence within harmful algal bloom-forming Microcystis aeruginosa. Mol. Biol. Evol. 2019, 28, 3994–4011. [Google Scholar] [CrossRef]
  32. Swan, B.K.; Tupper, B.; Sczyrba, A.; Lauro, F.M.; Martinez-Garcia, M.; Gonzalez, J.M.; Luo, H.; Wright, J.J.; Landry, Z.C.; Hanson, N.W.; et al. Prevalent genome streamlining and latitudinal divergence of planktonic bacteria in the surface ocean. Proc. Natl. Acad. Sci. USA 2013, 110, 11463–11468. [Google Scholar] [CrossRef] [PubMed]
  33. Dvorak, P.; Hasler, P.; Poulickova, A. Phylogeography of the Microcoleus vaginatus (Cyanobacteria) from three continents—A spatial and temporal characterization. PLoS ONE 2012, 7, e40153. [Google Scholar] [CrossRef]
  34. Parks, D.H.; Imelfort, M.; Skennerton, C.T.; Hugenholtz, P.; Tyson, G.W. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015, 25, 1043–1055. [Google Scholar] [CrossRef] [PubMed]
  35. Bowers, R.; Kyrpides, N.; Stepanauskas, R.; Harmon-Smith, M.; Doud, D.; Reddy, T.; Schulz, F.; Jarett, J.; Rivers, A.; Eloe-Fadrosh, E.; et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 2017, 35, 725–731. [Google Scholar] [CrossRef]
  36. Bobay, L.; Ellis, B.; Ochman, H. ConSpeciFix: Classifying prokaryotic species based on gene flow. Bioinformatics 2018, 34, 3738–3740. [Google Scholar] [CrossRef] [PubMed]
  37. Hyatt, D.; Chen, G.-L.; LoCascio, P.F.; Land, M.L.; Larimer, F.W.; Hauser, L.J. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinf. 2010, 11, 119. [Google Scholar] [CrossRef]
  38. Chan, P.P.; Lin, B.Y.; Mak, A.J.; Lowe, T.M. tRNAscan-SE 2.0: Improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 2021, 49, 9077–9096. [Google Scholar] [CrossRef]
  39. Ontiveros-Palacios, N.; Cooke, E.; Nawrocki, E.P.; Triebel, S.; Marz, M.; Rivas, E.; Griffiths-Jones, S.; Petrov, A.I.; Bateman, A.; Sweeney, B. Rfam 15: RNA families database in 2025. Nucleic Acids Res. 2024, 53, D258–D267. [Google Scholar] [CrossRef]
  40. Cantalapiedra, C.P.; Hernández-Plaza, A.; Letunic, I.; Bork, P.; Huerta-Cepas, J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol. Biol. Evol. 2021, 38, 5825–5829. [Google Scholar] [CrossRef]
  41. Machado, D.; Andrejev, S.; Tramontano, M.; Patil, K.R. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Res. 2018, 46, 7542–7553. [Google Scholar] [CrossRef]
  42. Ebrahim, A.; Lerman, J.A.; Palsson, B.O.; Hyduke, D.R. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Syst. Biol. 2013, 7, 74. [Google Scholar] [CrossRef] [PubMed]
  43. Emms, D.; Kelly, S. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biol. 2019, 20, 238. [Google Scholar] [CrossRef]
  44. Tettelin, H.; Riley, D.; Cattuto, C.; Medini, D. Comparative genomics: The bacterial pan-genome. Curr. Opin. Microbiol. 2008, 11, 472–477. [Google Scholar] [CrossRef] [PubMed]
  45. Vernikos, G.S.; Parkhill, J. Interpolated variable order motifs for identification of horizontally acquired DNA: Revisiting the Salmonella pathogenicity islands. Bioinformatics 2006, 22, 2196–2203. [Google Scholar] [CrossRef] [PubMed]
  46. Xie, Z.; Tang, H. ISEScan: Automated identification of insertion sequence elements in prokaryotic genomes. Bioinformatics 2017, 33, 3340–3347. [Google Scholar] [CrossRef]
  47. Akhter, S.; Aziz, R.K.; Edwards, R.A. PhiSpy: A novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res. 2012, 40, e126. [Google Scholar] [CrossRef] [PubMed]
  48. Buchfink, B.; Xie, C.; Huson, D.H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 2015, 12, 59–60. [Google Scholar] [CrossRef]
  49. Roberts, R.J.; Vincze, T.; Posfai, J.; Macelis, D. REBASE: A database for DNA restriction and modification: Enzymes, genes and genomes. Nucleic Acids Res. 2022, 51, D629–D630. [Google Scholar] [CrossRef]
  50. Couvin, D.; Bernheim, A.; Toffano-Nioche, C.; Touchon, M.; Michalik, J.; Neron, B.; Rocha, E.P.C.; Vergnaud, G.; Gautheret, D.; Pourcel, C. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 2018, 46, W246–W251. [Google Scholar] [CrossRef] [PubMed]
  51. Nies, F.; Mielke, M.; Pochert, J.; Lamparter, T. Natural transformation of the filamentous cyanobacterium Phormidium lacuna. PLoS ONE 2020, 15, e0234440. [Google Scholar] [CrossRef]
  52. Averhoff, B.; Kirchner, L.; Pfefferle, K.; Yaman, D. Natural transformation in Gram-negative bacteria thriving in extreme environments: From genes and genomes to proteins, structures and regulation. Extremophiles 2021, 25, 425–436. [Google Scholar] [CrossRef]
  53. Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef]
  54. Wang, D.; Zhang, Y.; Zhang, Z.; Zhu, J.; Yu, J. KaKs_Calculator 2.0: A toolkit incorporating gamma-series methods and sliding window strategies. Genom. Proteom. Bioinform. 2010, 8, 77–80. [Google Scholar] [CrossRef]
  55. Chen, H.; Zwaenepoel, A. Inference of Ancient Polyploidy from Genomic Data. In Polyploidy: Methods and Protocols; Van de Peer, Y., Ed.; Springer: New York, NY, USA, 2023; pp. 3–18. [Google Scholar]
  56. Bailey, T.L.; Johnson, J.; Grant, C.E.; Noble, W.S. The MEME Suite. Nucleic Acids Res. 2015, 43, W39–W49. [Google Scholar] [CrossRef] [PubMed]
  57. Quinlan, A.R.; Hall, I.M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 2010, 26, 841–842. [Google Scholar] [CrossRef] [PubMed]
  58. Perez-Carrascal, O.M.; Terrat, Y.; Giani, A.; Fortin, N.; Greer, C.W.; Tromas, N.; Shapiro, B.J. Coherence of Microcystis species revealed through population genomics. ISME J. 2019, 13, 2887–2900. [Google Scholar] [CrossRef]
  59. Brito, I.L. Examining horizontal gene transfer in microbial communities. Nat. Rev. Microbiol. 2021, 19, 442–453. [Google Scholar] [CrossRef]
  60. Kim, S.; Cho, C.-S.; Han, K.; Lee, J. Structural variation of AluElement and human disease. Genom. Inform. 2016, 14, 70–77. [Google Scholar] [CrossRef]
  61. White, M.; Allers, T. DNA repair in the archaea-an emerging picture. FEMS Microbiol. Rev. 2018, 42, 514–526. [Google Scholar] [CrossRef]
  62. Braus, S.A.G.; Short, F.L.; Holz, S.; Stedman, M.J.M.; Gossert, A.D.; Hospenthal, M.K. The molecular basis of FimT-mediated DNA uptake during bacterial natural transformation. Nat. Commun. 2022, 13, 1065. [Google Scholar] [CrossRef] [PubMed]
  63. Hélaine, S.; Carbonnelle, E.; Prouvensier, L.; Beretti, J.; Nassif, X.; Pelicic, V. PilX, a pilus-associated protein essential for bacterial aggregation, is a key to pilus-facilitated attachment of Neisseria meningitidis to human cells. Mol. Microbiol. 2005, 55, 65–77. [Google Scholar] [CrossRef]
  64. Sontheimer, E.; Davidson, A. Inhibition of CRISPR-Cas systems by mobile genetic elements. Curr. Opin. Microbiol. 2017, 37, 120–127. [Google Scholar] [CrossRef]
  65. Middelboe, M.; Traving, S.; Castillo, D.; Kalatzis, P.; Glud, R. Prophage-encoded chitinase gene supports growth of its bacterial host isolated from deep-sea sediments. ISME J. 2025, 19, wraf004. [Google Scholar] [CrossRef] [PubMed]
  66. Koonin, E.; Makarova, K. Origins and evolution of CRISPR-Cas systems. Phil. Trans. R. Soc. B 2019, 374, 20180087. [Google Scholar] [CrossRef] [PubMed]
  67. Chen, T.; Xiong, Y.; Zhang, J.; Zhang, Q.; Wu, J.; Xu, N.; Liu, T. Temporal dynamics, microdiversity, and ecological functions of viral communities during cyanobacterial blooms in Lake Taihu. npj Biofilms Microbiomes 2025, 11, 178. [Google Scholar] [CrossRef]
  68. Trisolini, L.; Gambacorta, N.; Gorgoglione, R.; Montaruli, M.; Laera, L.; Colella, F.; Volpicella, M.; De Grassi, A.; Pierri, C. FAD/NADH Dependent Oxidoreductases: From Different Amino Acid Sequences to Similar Protein Shapes for Playing an Ancient Function. J. Clin. Med. 2019, 8, 2117. [Google Scholar] [CrossRef]
  69. Peng, W.; Xu, Y.; Yin, Y.; Xie, J.; Ma, R.; Song, G.; Zhang, Z.; Quan, Q.; Jiang, Q.; Li, M.; et al. Biological characteristics of manganese transporter MntP in Klebsiella pneumoniae. mSphere 2024, 9, e0037724. [Google Scholar] [CrossRef]
  70. Eisenhut, M. Manganese Homeostasis in Cyanobacteria. Plants 2020, 9, 18. [Google Scholar] [CrossRef]
  71. Bosma, E.; Rau, M.; van Gijtenbeek, L.; Siedler, S. Regulation and distinct physiological roles of manganese in bacteria. FEMS Microbiol. Rev. 2021, 45, fuab028. [Google Scholar] [CrossRef] [PubMed]
  72. Wang, J.; Chitsaz, F.; Derbyshire, M.K.; Gonzales, N.R.; Gwadz, M.; Lu, S.; Marchler, G.H.; Song, J.S.; Thanki, N.; Yamashita, R.A.; et al. The conserved domain database in 2023. Nucleic Acids Res. 2023, 51, D384–D388. [Google Scholar] [CrossRef]
  73. Matsuoka, Y.; Li, X.; Bennet, V. Adducin: Structure, function and regulation. Cell. Mol. Life Sci. 2000, 57, 884–895. [Google Scholar] [CrossRef]
  74. Held, T.; Klemmer, D.; Lässig, M. Survival of the simplest in microbial evolution. Nat. Commun. 2019, 10, 2472. [Google Scholar] [CrossRef]
  75. Hammer, P.E.; Hill, D.S.; Lam, S.T.; Van Pée, K.H.; Ligon, J.M. Four genes from Pseudomonas fluorescens that encode the biosynthesis of pyrrolnitrin. Appl. Environ. Microbiol. 1997, 63, 2147–2154. [Google Scholar] [CrossRef]
  76. Willis, A.; Woodhouse, J.N. Defining Cyanobacterial Species: Diversity and Description Through Genomics. Crit. Rev. Plant Sci. 2020, 39, 101–124. [Google Scholar] [CrossRef]
  77. Liu, Q.; Liu, H.C.; Zhou, Y.G.; Xin, Y.H. Microevolution and Adaptive Strategy of Psychrophilic Species Flavobacterium bomense sp. nov. Isolated From Glaciers. Front. Microbiol. 2019, 10, 1069. [Google Scholar] [CrossRef]
  78. Dong, H.; Nilsson, L.; Kurland, C.G. Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J. Mol. Biol. 1996, 260, 649–663. [Google Scholar] [CrossRef] [PubMed]
  79. Weissman, J.L.; Hou, S.; Fuhrman, J.A. Estimating maximal microbial growth rates from cultures, metagenomes, and single cells via codon usage patterns. Proc. Natl. Acad. Sci. USA 2021, 118, e2016810118. [Google Scholar] [CrossRef] [PubMed]
  80. Chen, M.; Cui, R.; Hong, S.; Zhu, W.; Yang, Q.; Li, J.; Nie, Z.; Zhang, X.; Ye, Y.; Xue, Y.; et al. Broad-spectrum tolerance to disinfectant-mediated bacterial killing due to mutation of the PheS aminoacyl tRNA synthetase. Proc. Natl. Acad. Sci. USA 2025, 122, e2412871122. [Google Scholar] [CrossRef]
  81. Wagner, A. Adaptive evolvability through direct selection instead of indirect, second-order selection. J. Exp. Zool. Part B-Mol. Dev. Evol. 2022, 338, 395–404. [Google Scholar] [CrossRef]
  82. Morris, J.J.; Lenski, R.E.; Zinser, E.R. The Black Queen Hypothesis: Evolution of Dependencies through Adaptive Gene Loss. mBio 2012, 3, e00036-12. [Google Scholar] [CrossRef]
  83. Dick, G.J.; Duhaime, M.B.; Evans, J.T.; Errera, R.M.; Godwin, C.M.; Kharbush, J.J.; Nitschky, H.S.; Powers, M.A.; Vanderploeg, H.A.; Schmidt, K.C.; et al. The genetic and ecophysiological diversity of Microcystis. Environ. Microbiol. 2021, 23, 7278–7313. [Google Scholar] [CrossRef]
  84. Chan, C.; Beiko, R.; Ragan, M. Lateral transfer of genes and gene fragments in Staphylococcus extends beyond mobile elements. J. Bacteriol. 2011, 193, 3964–3977. [Google Scholar] [CrossRef]
  85. Koksharova, O.; Popova, A.; Plyuta, V.; Khmel, I. Four new genes of cyanobacterium Synechococcus elongatus PCC 7942 are responsible for sensitivity to 2-Nonanone. Microorganisms 2020, 8, 1234. [Google Scholar] [CrossRef]
  86. Monika, S.; Malgorzata, B.; Zbigniew, O. Contribution of Aspartic Proteases in Candida Virulence. Protease Inhibitors against Candida Infections. Curr. Protein Pept. Sci. 2017, 18, 1050–1062. [Google Scholar] [CrossRef]
  87. Sharp, P.M.; Shields, D.C.; Wolfe, K.H.; Li, W.H. Chromosomal location and evolutionary rate variation in enterobacterial genes. Science 1989, 246, 808–810. [Google Scholar] [CrossRef]
  88. Rubin, I.N.; Ispolatov, Y.; Doebeli, M. Maximal ecological diversity exceeds evolutionary diversity in model ecosystems. Ecol. Lett. 2023, 26, 384–397. [Google Scholar] [CrossRef]
  89. Fisher, K.A.; Yarwood, S.A.; James, B.R. Soil urease activity and bacterial ureC gene copy numbers: Effect of pH. Geoderma 2017, 285, 1–8. [Google Scholar] [CrossRef]
  90. Khandelwal, A.; Patel, A.; Tiwari, S.; Prasad, S.M. Tryptamine: A novel signaling molecule alleviating salt-induced toxicity by enhancing antioxidant defense and PSII photochemistry in Anabaena PCC7120. Arch. Microbiol. 2025, 208, 64. [Google Scholar] [CrossRef]
Figure 1. Divergence in genomic architecture and functional gene allocation among focal cyanobacterial lineages. (A) Representative photographs of the typical habitats and microscopic morphologies of the focal cyanobacterial species: Microcoleus vaginatus (M.v) inhabiting biological soil crusts in drylands, and Microcystis aeruginosa (M.a) forming cyanoblooms in freshwater ecosystems. (B) Comparison of genomic features among terrestrial M.v ecotypes, aquatic M.v ecotypes, M.a, and reference cyanobacterial genomes. (C) Proportions of genes assigned to specific COG functional categories across the four groups. In the boxplots, the center line represents the median, the box limits indicate the upper and lower quartiles, and whiskers extend to 1.5 times the interquartile range. Only COG categories showing significant divergence in over half of all pairwise comparisons were retained for visualization. Statistical significance was assessed using the Wilcoxon rank-sum test (ns, not significant; * p < 0.05; ** p < 0.01; *** p < 0.001).
Figure 1. Divergence in genomic architecture and functional gene allocation among focal cyanobacterial lineages. (A) Representative photographs of the typical habitats and microscopic morphologies of the focal cyanobacterial species: Microcoleus vaginatus (M.v) inhabiting biological soil crusts in drylands, and Microcystis aeruginosa (M.a) forming cyanoblooms in freshwater ecosystems. (B) Comparison of genomic features among terrestrial M.v ecotypes, aquatic M.v ecotypes, M.a, and reference cyanobacterial genomes. (C) Proportions of genes assigned to specific COG functional categories across the four groups. In the boxplots, the center line represents the median, the box limits indicate the upper and lower quartiles, and whiskers extend to 1.5 times the interquartile range. Only COG categories showing significant divergence in over half of all pairwise comparisons were retained for visualization. Statistical significance was assessed using the Wilcoxon rank-sum test (ns, not significant; * p < 0.05; ** p < 0.01; *** p < 0.001).
Phycology 06 00064 g001
Figure 2. Mechanisms of genomic plasticity and molecular evolutionary signatures. (A) Ridgeline plots illustrating the distribution of the genomic ratio (%) occupied by putative HGT regions, IS, and prophages across terrestrial M.v ecotypes, aquatic M.v ecotypes, M.a, and reference genomes. (B) Distribution of CRISPR spacer counts and complete RM system counts, reflecting the diversity and investment in defense systems across the studied lineages. (C) Violin plot showing the distribution of pairwise non-synonymous to synonymous substitution rate ratios (Ka/Ks) for core OGs in terrestrial M.v, aquatic M.v, and aquatic M.a. The center line within the boxes indicates the median values. The percentage of gene pairs under positive selection (Ka/Ks > 1) is noted above each distribution. Asterisks *** indicate significant differences (p < 0.001) between groups based on the Wilcoxon rank-sum test.
Figure 2. Mechanisms of genomic plasticity and molecular evolutionary signatures. (A) Ridgeline plots illustrating the distribution of the genomic ratio (%) occupied by putative HGT regions, IS, and prophages across terrestrial M.v ecotypes, aquatic M.v ecotypes, M.a, and reference genomes. (B) Distribution of CRISPR spacer counts and complete RM system counts, reflecting the diversity and investment in defense systems across the studied lineages. (C) Violin plot showing the distribution of pairwise non-synonymous to synonymous substitution rate ratios (Ka/Ks) for core OGs in terrestrial M.v, aquatic M.v, and aquatic M.a. The center line within the boxes indicates the median values. The percentage of gene pairs under positive selection (Ka/Ks > 1) is noted above each distribution. Asterisks *** indicate significant differences (p < 0.001) between groups based on the Wilcoxon rank-sum test.
Phycology 06 00064 g002
Figure 3. Distribution patterns and functional characterization of lineage-specific and shared OGs. (A) Venn diagram displaying the number of conserved OGs (present in ≥95% of strains within a group) that are exclusive to or shared among terrestrial M.v, aquatic M.v, and M.a. OGs present in >20% of the reference genomes were filtered out to highlight non-ubiquitous function. (B) Bubble plot mapping the COG functional category distribution (Y-axis) across unique and shared OG subsets (X-axis). Bubble size represents the number of equivalent OGs, and the color gradient indicates the average COG category ratio (%) within each OG. The columns ‘terrestrial M.v’, ‘aquatic M.v’, and ‘aquatic M.a’ show OGs unique to each specific ecotype or species, driving their respective specializations. The column ‘aquatic M.v & terrestrial M.v’ shows conserved core genes in Microcoleus vaginatus across different habitats. The column ‘aquatic M.v & aquatic M.a’ represents OGs shared exclusively by the two aquatic lineages, reflecting potential convergent adaptation. The column ‘all-shared’ identifies the three universal OGs maintained across all focal lineages, which may represent the fundamental requirements for ecological dominance of both species.
Figure 3. Distribution patterns and functional characterization of lineage-specific and shared OGs. (A) Venn diagram displaying the number of conserved OGs (present in ≥95% of strains within a group) that are exclusive to or shared among terrestrial M.v, aquatic M.v, and M.a. OGs present in >20% of the reference genomes were filtered out to highlight non-ubiquitous function. (B) Bubble plot mapping the COG functional category distribution (Y-axis) across unique and shared OG subsets (X-axis). Bubble size represents the number of equivalent OGs, and the color gradient indicates the average COG category ratio (%) within each OG. The columns ‘terrestrial M.v’, ‘aquatic M.v’, and ‘aquatic M.a’ show OGs unique to each specific ecotype or species, driving their respective specializations. The column ‘aquatic M.v & terrestrial M.v’ shows conserved core genes in Microcoleus vaginatus across different habitats. The column ‘aquatic M.v & aquatic M.a’ represents OGs shared exclusively by the two aquatic lineages, reflecting potential convergent adaptation. The column ‘all-shared’ identifies the three universal OGs maintained across all focal lineages, which may represent the fundamental requirements for ecological dominance of both species.
Phycology 06 00064 g003
Figure 4. Comparative domain profiling of FAD-dependent oxidoreductase OGs. The heatmap illustrates the distribution of PFAM domains (X-axis) across various OGs (Y-axis) annotated as FAD-dependent oxidoreductases. Blue rectangles indicate the presence of a specific domain within an OG. The OG highlighted in red (OG0002323) represents the universally conserved core identified across all focal lineages. Red arrows highlight the unique domain architecture of OG0002323, which harbors both the FAD_binding_3 and Trp_halogenase domains, distinguishing it from other potential isozymes that lack this functional combination.
Figure 4. Comparative domain profiling of FAD-dependent oxidoreductase OGs. The heatmap illustrates the distribution of PFAM domains (X-axis) across various OGs (Y-axis) annotated as FAD-dependent oxidoreductases. Blue rectangles indicate the presence of a specific domain within an OG. The OG highlighted in red (OG0002323) represents the universally conserved core identified across all focal lineages. Red arrows highlight the unique domain architecture of OG0002323, which harbors both the FAD_binding_3 and Trp_halogenase domains, distinguishing it from other potential isozymes that lack this functional combination.
Phycology 06 00064 g004
Figure 5. Evolutionary hotspots driving lineage-specific adaptation. The top bar chart shows the number of evolutionary hotspot OGs identified across different lineages and their intersections. Hotspots are categorized into two types: HGT hotspots (yellow), defined by an HGT ratio > 0.5, and positive selection hotspots (blue), characterized by a median Ka/Ks > 1. The bottom table shows detailed evolutionary and functional metrics for representative hotspot OGs.
Figure 5. Evolutionary hotspots driving lineage-specific adaptation. The top bar chart shows the number of evolutionary hotspot OGs identified across different lineages and their intersections. Hotspots are categorized into two types: HGT hotspots (yellow), defined by an HGT ratio > 0.5, and positive selection hotspots (blue), characterized by a median Ka/Ks > 1. The bottom table shows detailed evolutionary and functional metrics for representative hotspot OGs.
Phycology 06 00064 g005
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wei, J.; Li, H.; Guo, X.; Wang, Y.; Hu, C. Comparative Genomic Analysis of Cosmopolitan Dominant Cyanobacteria Microcoleus vaginatus and Microcystis aeruginosa. Phycology 2026, 6, 64. https://doi.org/10.3390/phycology6020064

AMA Style

Wei J, Li H, Guo X, Wang Y, Hu C. Comparative Genomic Analysis of Cosmopolitan Dominant Cyanobacteria Microcoleus vaginatus and Microcystis aeruginosa. Phycology. 2026; 6(2):64. https://doi.org/10.3390/phycology6020064

Chicago/Turabian Style

Wei, Jingyi, Hua Li, Xiaoyu Guo, Yunzhu Wang, and Chunxiang Hu. 2026. "Comparative Genomic Analysis of Cosmopolitan Dominant Cyanobacteria Microcoleus vaginatus and Microcystis aeruginosa" Phycology 6, no. 2: 64. https://doi.org/10.3390/phycology6020064

APA Style

Wei, J., Li, H., Guo, X., Wang, Y., & Hu, C. (2026). Comparative Genomic Analysis of Cosmopolitan Dominant Cyanobacteria Microcoleus vaginatus and Microcystis aeruginosa. Phycology, 6(2), 64. https://doi.org/10.3390/phycology6020064

Article Metrics

Back to TopTop