Genome-Wide Investigation and Analysis of Microsatellites and Compound Microsatellites in Leptolyngbya-like Species, Cyanobacteria

Microsatellites (simple sequence repeats, SSRs) are ubiquitously distributed in almost all known genomes. Here, the first investigation was designed to examine the SSRs and compound microsatellites (CSSRs) in genomes of Leptolyngbya-like strains. The results disclosed diversified patterns of distribution, abundance, density, and diversity of SSRs and CSSRs in genomes, indicating that they may be subject to rapid evolutionary change. The numbers of SSRs and CSSRs were extremely unevenly distributed among genomes, ranging from 11,086 to 24,000 and from 580 to 1865, respectively. Dinucleotide SSRs were the most abundant category in 31 genomes, while the other 15 genomes followed the pattern: mono- > di- > trinucleotide SSRs. The patterns related to SSRs and CSSRs showed differences among phylogenetic groups. Both SSRs and CSSRs were overwhelmingly distributed in coding regions. The numbers of SSRs and CSSRs were significantly positively correlated with genome size (p < 0.01) and negatively correlated with GC content (p < 0.05). Moreover, the motif (A/C)n and (AG)n was predominant in mononucleotide and dinucleotide SSRs, and unique motifs of CSSRs were identified in 39 genomes. This study provides the first insight into SSRs and CSSRs in genomes of Leptolyngbya-like strains and will be useful to understanding their distribution, predicting their function, and tracking their evolution. Additionally, the identified SSRs may provide an evolutionary advantage of fast adaptation to environmental changes and may play an important role in the cosmopolitan distribution of Leptolyngbya strains to globally diverse niches.


Introduction
Leptolyngbya are ecologically important cyanobacteria that are often found to be prosperous in thermal environments [1,2]. Leptolyngbya strains have shown extensive biotechnological potential in pharmaceutical [3] and biodegradation applications [4]. Although an increasing number of Leptolyngbya strains were proposed, the identification of Leptolyngbya-like strains has been controversial due to their simple morphology. The heterogeneity of Leptolyngbya has been questionable since the establishment of this genus [5]. Moreover, the genus Leptolyngbya is recognized as polyphyletic [6,7], and a taxonomic reevaluation is essential for better understanding this genus. To compensate for the limited information provided by cell morphology, molecular approaches have been widely applied to establish the correct taxonomy. Molecular markers, primarily 16S rRNA and/or 16S-23S intergenic spacer (ITS), alleviate the taxonomic recognition to some extent. However, it was ineffective in dealing with closely related Leptolyngbya species [8]. As the increasing availability of genome sequences, bioinformatic analyses at the genomic level may provide new insight into the evolutionary relationship among Leptolyngbya species.
Microsatellites, also called simple sequence repeats (SSRs), are tandem repeats with a length of 1-6 bp in genomes [9]. SSRs are characterized as hypervariability and hypermutability [10]. They appeared to be scattered across the genome and be traced in both coding and non-coding regions [11]. SSRs are crucial in the determination and understanding of microbial genome evolution [12]. Moreover, SSRs are important evolutionary markers that are useful in tracking SSRs length variations such as point mutations, duplications, DNA repair, and replication slippage across the entire genomes [12]. In addition, an increasing number of examples illustrate that bacteria can exploit this instability of SSRs as potential engines of genetic variability and bacterial adaptation on short evolutionary time scales without an increased overall mutation rate [13]. Moreover, SSRs are of significant interest to researchers in light of substantial applications in genetic mapping, DNA fingerprinting, population genetics, gene regulation, paternity studies, and evolution [9,14,15].
Compound microsatellites (CSSRs) are more complex sequences typically consisting of two or more SSRs, e.g., (GCA) n -(C) n -(CA) n , and are supposed to possess higher polymorphism than SSRs [16]. Jakupciak and Wells [17] suggested that compound microsatellites resulted from recombination between homologous SSRs. Diverse genomic features and evolutionary traces of CSSRs were reported between related species, e.g., Escherichia coli and lactobacilli [18].
With the development of sequencing technology and in silico methodologies, conventional SSR mining based on genomic libraries is being replaced by computational mining from tremendous genome sequences [19,20]. The increasing availability of genome sequences and specialized bioinformatics software tremendously accelerate the characterization of SSRs on a genome-wide scale, which obviously is a prerequisite for understanding their distribution, predicting their function, and tracking their evolution.
To date, there were numerous Leptolyngbya genomes available according to the genomic resources of the National Center for Biotechnology Information (NCBI), offering an opportunity for SSR discovery at the genomic level. To our knowledge, a genome-wide survey of SSRs and CSSRs is unavailable for Leptolyngbya genomes. The present study was designed to mine and analyze SSRs and CSSRs, and to further reveal the patterns of distribution, abundance, density, and diversity of SSRs and CSSRs in Leptolyngbya genomes. This study provides the first insight into SSRs and CSSRs in Leptolyngbya genomes and may be useful for future studies on the function and evolution of these repeat sequences in Leptolyngbya strains.

Genome Sequences
According to the genomic resources of NCBI at the time of this study (20 October 2020), a total of 53 Leptolyngbya genomes were retrieved as a dataset for SSR and CSSR analysis. Information regarding these genomes was summarized in Table S1. Filtering was further performed to remove obvious non-Leptolyngbya strains by literature search and Blast search of 16S rRNA sequence against NCBI database. In addition, genomic annotations of these Leptolyngbya genomes were also downloaded for corresponding analysis.
To illustrate the relationship among the strains studied, multi-locus sequence analysis (MLSA) was performed using concatenated sequences of 15 genes from each genome. These genes were frr, pgk, rplA, rplB, rplC, rplE, rplK, rplL, rplM, rplN, rplP, rplT, rpmA, rpsC, and rpsS. Genes were recommended locus for MLSA by reference [21] and selected based on a larger dataset with more genes given to the availability and completeness in genomes. Strains with much less common genes to other genomes were filtered for phylogenetic analysis. Sequences of each gene were aligned, edited, and trimmed in Mega7 [22]. Sequences were concatenated using BioEdit 7 [23]. Maximum-Likelihood (ML) phylogenetic analyses were carried out using PhyML v3.0 [24]. Parameter settings in PhyML were followed as described [25]. The whole-genome average nucleotide identity (ANI) between genomes was calculated using the ANI calculator with default settings [26].

Identification and Analysis of SSRs and CSSRs
The perfect SSR and CSSRs were identified in each genome using the repeat search engine Krait v1.2.2 [27]. In light of small genomes in Leptolyngbya strains, the minimum  repeats were customized to 6, 3, 3, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta-, and hexanucleotide SSRs, respectively [16]. The maximum distance allowed between any two adjacent SSRs (dmax) was set to 10 bp for the CSSRs analysis. The other parameters in Krait were maintained as default. All identified perfect SSRs and CSSRs were mapped into coding and non-coding regions to feature coordinates using Krait. The complexity and motifs of CSSRs were investigated as well.
To mitigate the effect of genome size on the comparative analysis, the numbers of SSRs and CSSRs were normalized as relative abundance (RA), the number of SSRs and CSSRs per kb of the genome sequence studied, and relative density (RD), the total length contributed by each SSRs and CSSRs per kb of the genome sequence studied.

Statistical Analysis
To facilitate interpretation, statistical terms used in this study were abbreviated as follows. nSSR: number of SSRs in each genome; nCSSR: number of CSSRs in each genome; cSSR: individual SSR being part of such a CSSR; C: complexity defined by the number of cSSRs in a CSSR; ncSSR: number of cSSR in each genome; and cSSR%: percentage of ncSSR account for nSSR in each genome (cSSR% = ncSSR/nSSR).
The Pearson correlation coefficient (ρ) was calculated using a custom R script to uncover the associations between variables, including genome size, GC content, nSSR, nCSSR and ncSSR. Significance levels of 0.05 and 0.01 were applied. The significance of CSSR representation in each genome was statistically evaluated by an index, Z [28]. Z scores were computed using the following equations: where n, number of genomes studied (n = 46); i, genome order; ncSSR i , number of cSSR in genome; nCSSR i (also called nCSSR obs ), observed number of CSSRs in genome; C, average of complexity of 46 genomes (C = 2.069 in this study); nCSSR exp , expected number of CSSRs in genome.

Phylogenetic Relationship of Leptolyngbya Strains
Four strains were filtered from analysis for new genus proposal [29], generating a dataset comprising 49 genomes of Leptolyngbya-like strains. Based on the availability and quality of a single locus from each genome, the concatenated sequences of 15 genes representing 43 Leptolyngbya-like strains and 11 reference strains were constructed to infer the phylogenetic relationship. The reference strains contained strains from the family Leptolyngbyaceae and Oculatellaceae, including Alkalinema, Leptodesmis, Myxacorys, Neosynechococcus, Oculatella, Pantanalinema, Phormidesmis, Stenomitos and Thermoleptolyngbya, and Gloeobacter as outgroup. The ML phylogenetic tree ( Figure 1) inferred by the concatenated sequences categorized the cyanobacterial strains into well-supported clades. Ten strains were grouped into previously established genus Leptolyngbya sensu stricto [30], and the other Leptolyngbya-like strains were scattered across the tree (Figure 1). Noticeably, Leptolyngbya sp. FACHB-671 and FACHB-541 are closely clustered with Oculatella sp. FACHB28, while Leptolyngbya sp. FACHB-261 was an outlier (Figure 1). This result indicated evident conflict between phylogeny and taxonomy, thus removing the three genomes from subsequent analysis. The ML tree ( Figure 1) and distance matrix of the concatenated sequences (Table S2) revealed that considerably genetic divergences existed between strains within Leptolyngbya sensu stricto and the other Leptolyngbya-like strains. The Leptolyngbya strains did not form a monophyletic clade but mixed with the reference strains (Figure 1), suggesting the interspecific heterogeneity within this genus. This result is not unexpected since it is well-known that Leptolyngbya is polyphyletic [6]. However, further ANI analysis showed that the ANI values between Leptolyngbya sensu stricto and the other Leptolyngbya-like strains were all below 80% (Table S3), which conformed to the typical values (<80% ANI) for organisms of different genera [31]. Taken together, it is still quite challenging to determine the actual taxonomy of these Leptolyngbya-like strains without more evidence, e.g., morphology, etc. Thus, dedicated works in separate studies are required to illustrate the taxonomy positions of these Leptolyngbya-like strains with polyphasic approaches. For the convenience of presentation and comparisons, the Leptolyngbya-like strains were grouped according to the phylogenetic tree ( Figure 1) and the ANI values (Table S3) for genus delimitation.

Number, Relative Abundance and Density of SSRs and CSSRs
A total of 46 genomes of Leptolyngbya-like strains were finalized for SSR and CSSR analysis. These strains were originated from diverse habitats (Table 1), including freshwater, marine, hot spring, terrestrial, and other environments. Across the 46 genomes of Leptolyngbya-like strains, a total of 741,115 perfect SSRs were identified (Table 1). Extremely uneven distribution of SSRs number was observed among genomes, ranging from 11,086 to 24,000. The relative abundance (RA) and relative density (RD) of SSRs both showed significant dissimilarity among the genomes of Leptolyngbya-like strains, shifting from 2.00 to 3.32/kb and from 13.20 to 24.08 bp/kb. The RA and RD of genus Leptolyngbya sensu stricto varied from 2.47 to 2.53/kb and from 16.30 to 16.67 bp/kb, respectively. The RA and RD values of genus Leptolyngbya sensu stricto were higher than that of Clade A, C, and D but lower than that of Clade B, E, F, and G. The RA and RD values of other ungrouped Leptolyngbya-like strains were also higher or lower than that of genus Leptolyngbya sensu stricto.
There were 24,703 CSSRs identified in the 46 genomes of Leptolyngbya-like strains ( Table 1). Similar to SSRs, the number of CSSRs tremendously varied among genomes, from 286 to 907. Massive variations were also exhibited by RA and RD of CSSRs (Table 1), ranging from 0.05 to 0.14/kb and from 0.61 to 2.11 bp/kb, respectively. Analogously, strains within genus Leptolyngbya sensu stricto or clades showed accordant RA and RD of CSSRs. Genus Leptolyngbya sensu stricto, together with clade C and D, showed similar RA and RD of CSSRs, the values of which were higher than that of Clade A and lower than that of Clade B, E, F, and G ( Table 1).
The number of cSSR in each genome (ncSSR) ranged from 580 to 1865 (Table 1). And the results suggested that only a small part of all SSRs (less than 10%) in each genome consisted of a compound motif as revealed by cSSR% (Table 1). Between genus Leptolyngbya sensu stricto and clades, different cSSR% was observed, e.g., 5.88-6.42% in genus Leptolyngbya sensu stricto, 4.64-5.16% in Clade A, and 6.92-9.07% in Clade G. These results indicated that the proportion of SSRs participating CSSR was inconsistent among strains, though similar RA and RD of SSR and CSSR were shared by strains. The significance of CSSR representation, the Z scores, indicated that the nCSSR obs was less than nCSSR exp in 17 genomes, while the opposite results in the remaining 29 genomes. The greatest statistical significance was represented by the genome of Leptolyngbya sp. ULC073bin1 (S39).

Distribution and Diversity of SSRs
As shown in Figure 2a, mononucleotide, dinucleotide, and trinucleotide SSRs accounted for the vast majority of SSRs in each genome, from 98.58% to 99.50%. However, the most abundant category was different among genomes. Dinucleotide SSRs were the most abundant in 31 genomes, accounting for 39.41 to 52.11% of all SSRs, followed by mononucleotide and trinucleotide SSRs, while the other 15 genomes followed the pattern: mono-> di-> trinucleotide SSRs. The proportion of tetranucleotide SSRs was more than that of pentanucleotide and hexanucleotide SSRs in each genome. The same distribution pattern of SSR type exited within genus Leptolyngbya sensu stricto and Clade A, B, C, D and F (di-> mono-> trinucleotide SSRs), and within clade G (mono-> di-> trinucleotide SSRs), whereas both patterns within Clade E. Overwhelmingly, SSRs were found to be distributed in coding regions of all 46 genomes analyzed (Figure 2b), accounting for 62.4-84.45% of SSRs. And only low percentages of SSRs (15.55-37.60%) were located in non-coding regions. A heatmap ( Figure S1) was constructed to show the relative abundance of 332 standard motifs identified in each genome. There were evident distinctions among genomes regarding relative abundance of motifs in mononucleotide (0.07-1.02), dinucleotide (0.01-0.86) and trinucleotide (0.002-0.999) repeat type (Figure 3). Although genus Leptolyngbya sensu stricto showed consistency of the relative abundance of standard motifs, variations of that evidently existed in the clades. The motif (A/C)n was the predominant mononucleotide repeat type in genomes. (AG)n, (AC)n, and (CG)n were the three most abundant dinucleotide SSRs motifs, among which (AG)n was particularly dominant. Among the trinucleotide repeat type, (ACG)n and (CCG)n were the most abundant motifs. The relative abundances of motifs in tetranucleotide, pentanucleotide, and hexanucleotide repeat types were similar among genomes ( Figure S1). A heatmap ( Figure S1) was constructed to show the relative abundance of 332 standard motifs identified in each genome. There were evident distinctions among genomes regarding relative abundance of motifs in mononucleotide (0.07-1.02), dinucleotide (0.01-0.86) and trinucleotide (0.002-0.999) repeat type (Figure 3). Although genus Leptolyngbya sensu stricto showed consistency of the relative abundance of standard motifs, variations of that evidently existed in the clades. The motif (A/C) n was the predominant mononucleotide repeat type in genomes. (AG) n , (AC) n , and (CG) n were the three most abundant dinucleotide SSRs motifs, among which (AG) n was particularly dominant. Among the trinucleotide repeat type, (ACG)n and (CCG)n were the most abundant motifs. The relative abundances of motifs in tetranucleotide, pentanucleotide, and hexanucleotide repeat types were similar among genomes ( Figure S1).

Complexity, Motifs and Distribution of CSSRs
The complexity of CSSRs in the 46 genomes ranged from 2 to 6 (Table S4). A vast majority of complexity was 2, accounting for 93.84% of all the CSSRs (Table S4). And the count of CSSRs decreases with the increase of complexity. Slightly differences of CSSRs complexity were noticed within the genus Leptolyngbya sensu stricto or the clades. The complexity of CSSRs for Clade G was up to 6, while that for Clade B was up to 4, and that for the other clades ranged from 2 to 5. These results suggested the diversity of motifs among genomes or groups. Moreover, unique motifs were identified in 39 genomes (Table S5), and the number of unique motifs sharply varied among genomes, from 1 (L. boryana PCC 6306) to 84 (Leptolyngbya sp. SIO1E4). Unique motifs were identified in only three out of ten strains from genus Leptolyngbya sensu stricto.
The distribution of CSSRs, identical to that of SSRs, were also dominantly in coding regions of all 46 genomes analyzed (Figure 4), accounting for 50.6-81.0%. The distribution pattern of SSRs and CSSRs obtained in the present study was in accordance with the prevailing results that SSRs and CSSRs in prokaryotic genomes were located more frequently in coding regions than in non-coding regions [32,33]. Interestingly, it was noticed that the percentage of CSSRs in non-coding regions increased with the increase in complexity.

Complexity, Motifs and Distribution of CSSRs
The complexity of CSSRs in the 46 genomes ranged from 2 to 6 (Table S4). A vast majority of complexity was 2, accounting for 93.84% of all the CSSRs (Table S4). And the count of CSSRs decreases with the increase of complexity. Slightly differences of CSSRs complexity were noticed within the genus Leptolyngbya sensu stricto or the clades. The complexity of CSSRs for Clade G was up to 6, while that for Clade B was up to 4, and that for the other clades ranged from 2 to 5. These results suggested the diversity of motifs among genomes or groups. Moreover, unique motifs were identified in 39 genomes (Table S5), and the number of unique motifs sharply varied among genomes, from 1 (L. boryana PCC 6306) to 84 (Leptolyngbya sp. SIO1E4). Unique motifs were identified in only three out of ten strains from genus Leptolyngbya sensu stricto.
The distribution of CSSRs, identical to that of SSRs, were also dominantly in coding regions of all 46 genomes analyzed (Figure 4), accounting for 50.6-81.0%. The distribution pattern of SSRs and CSSRs obtained in the present study was in accordance with the prevailing results that SSRs and CSSRs in prokaryotic genomes were located more frequently in coding regions than in non-coding regions [32,33]. Interestingly, it was noticed that the percentage of CSSRs in non-coding regions increased with the increase in complexity.

Discussion
In this study, bioinformatics tools were employed to provide patterns of distribution, abundance, density, and diversity of SSRs and CSSRs in 46 genomes of Leptolyngbya-like strains. The results indicated the dissimilarity patterns of SSRs distribution among these genomes (Table 1, Figure 2), suggesting that SSRs might contribute to the genetic diversity of Leptolyngbya genomes and may indicate that they are subject to rapid evolutionary change [34]. The highly consistent patterns of SSRs distribution observed in genus Leptolyngbya sensu stricto or the clades implied that the dissimilarity patterns of SSRs distribution were probably ascribed to the genetic discrepancy. The genomes of Leptolyngbya-like strains differed in the most abundant repeat type, either mononucleotides or dinucleotides. This was in accordance with the prevalence of mononucleotide or dinucleotide repeats in prokaryotic genomes [35], although sometimes trinucleotide SSRs (e.g., Cyanobium gracile PCC 6307) were the most abundant category of SSRs. Mononucleotide repeats were normally characterized as dominant SSRs in eukaryotic genomes, like all human chromosomes [36].
The smaller motifs were predominant in the genomes of Leptolyngbya-like strains ( Figure S1), and the occurrence decreased with the increase of motif length. This trend was shared in a wide range of organisms [37,38]. The motif (A/C)n were the predominant mononucleotide repeat type in the genomes, which was in agreement with the pattern in other cyanobacteria [35]. Among the dinucleotide SSRs in the genomes, (AG)n was the

Discussion
In this study, bioinformatics tools were employed to provide patterns of distribution, abundance, density, and diversity of SSRs and CSSRs in 46 genomes of Leptolyngbya-like strains. The results indicated the dissimilarity patterns of SSRs distribution among these genomes (Table 1, Figure 2), suggesting that SSRs might contribute to the genetic diversity of Leptolyngbya genomes and may indicate that they are subject to rapid evolutionary change [34]. The highly consistent patterns of SSRs distribution observed in genus Leptolyngbya sensu stricto or the clades implied that the dissimilarity patterns of SSRs distribution were probably ascribed to the genetic discrepancy. The genomes of Leptolyngbya-like strains differed in the most abundant repeat type, either mononucleotides or dinucleotides. This was in accordance with the prevalence of mononucleotide or dinucleotide repeats in prokaryotic genomes [35], although sometimes trinucleotide SSRs (e.g., Cyanobium gracile PCC 6307) were the most abundant category of SSRs. Mononucleotide repeats were normally characterized as dominant SSRs in eukaryotic genomes, like all human chromosomes [36].
The smaller motifs were predominant in the genomes of Leptolyngbya-like strains ( Figure S1), and the occurrence decreased with the increase of motif length. This trend was shared in a wide range of organisms [37,38]. The motif (A/C) n were the predominant mononucleotide repeat type in the genomes, which was in agreement with the pattern in other cyanobacteria [35]. Among the dinucleotide SSRs in the genomes, (AG) n was the predominant motif, while the other motifs, like (AT) n , were also predominant in cyanobacteria, e.g., Calothrix.
The genome sizes of 46 Leptolyngbya-like strains ranged from 3.9 Mb to 8.8 Mb (Table 1). The correlation analysis suggested a positively correlation between nSSR/nCSSR and genome size (ρ = 0.81/0.52, p < 0.01) ( Table 2), although in several cases smaller genomes contained more SSRs or CSSRs ( Table 1). The GC content of all the 46 genomes varied from 45.71% to 59.44% (Table 1). Interestingly, GC content had no significant correlation with nCSSR (ρ = 0.18, p > 0.05) but negatively significant relation with nSSR (ρ = −0.30, p < 0.05). Furthermore, the GC content of SSRs influenced by the GC content of the genome might affect the marker developments due to the difficult amplification of GC-rich SSRs by PCR. In this study, SSRs of 46 genomes of Leptolyngbya-like strains appeared to be AT-rich ( Figure S1), which might be valuable in the development of SSRs markers.
The complexity analysis of CSSRs in the 46 genomes of Leptolyngbya-like strains showed that these CSSRs primarily comprised two SSRs (complexity = 2) ( Table S4). As the increase in complexity, the number of CSSRs rapidly decreased. The motifs in CSSRs were quite diverse in each surveyed genome (Table S4). In addition, a large number of unique motifs were identified in 39 genomes (Table S4). These unique motifs were possibly shaped by two reasons. First, the diverse SSR types in each genome generated various motifs (SSR-couple). Second, mutations within SSRs are reported to be frequent [39]. The surveyed Leptolyngbya-like strains were from diverse niches ( Table 1) and easily possessed diversified mutations during evolutionary processes. This hypothesis was verified by the unique motifs obtained in this study that were differentiated from each other by just one or two single mutations (Table S4).
The SSRs and CSSRs identified in this study were predominantly distributed in coding regions of each genome (Figures 2b and 4). This is probably ascribed to the fact that bacterial genomes are more compact than those of eukaryotes. This result indicated a potential functional role of SSRs and CSSRs in influencing transcription, protein function, gene regulation, and genome organization [9,40]. Particularly, SSRs within genes should be subjected to stronger selective pressure than other genomic regions because of their functional importance [41]. The selection pressure may result in a systematic directional change of the respective repeat number that leads, finally, to desirable activity levels of the adaptationally relevant genes and relaxation of the stress, i.e., adaptation [42]. In this study, a large number of SSRs were identified in the 46 genomes of Leptolyngbya-like strains (Table 1). These SSRs may provide an evolutionary advantage of fast adaptation to environmental changes in Leptolyngbya and may play an important role in the cosmopolitan distribution of Leptolyngbya-like strains to globally diverse niches ( Table 1). The different percentages of SSRs distribution in coding regions among groups or phylotypes may suggest a different level of involvement in functions or evolution. However, the possible functions, as well as the mutational mechanism, remain mostly unknown [43]. To date, replication slippage and recombination are currently widely accepted to explain SSR variation. Moreover, overlapping genes extensively existed in prokaryotic genomes, possibly resulting in more influences caused by SSRs or CSSRs. Future studies could increasingly unravel the significant evolutionary role of SSRs in regulating gene expression under diverse environmental stresses.
Variations about genome sizes and distribution patterns of SSRs and CSSRs were evident in the surveyed genomes of Leptolyngbya-like strains. This might be attributed to the fact that Leptolyngbya has been recognized as polyphyletic [6], and distinct phylotypes existed in the current datasets ( Figure 1 and Table S2). Consistent distribution patterns of SSRs and CSSRs were achieved within the genus Leptolyngbya sensu stricto or the clades. According to the public microsatellite database (http://big.cdu.edu.cn/psmd/, updated on July 2020, accessed on 20 October 2021), the SSR number of the genomes of Leptolyngbyalike strains (11,086 to 24,000) in this study is comparable to that of other cyanobacterial genomes (2283 to 53,041). But evident variations of SSR number were observed at the genus level, such as Thermosynechococcus (7490 to 7724) and Tolypothrix (32,706 to 37,800). A similar situation was also noticed in CSSR and cSSR% between Leptolyngbya genomes and other cyanobacterial genomes. The ANI analysis indicated that several genera might exist among Leptolyngbya-like strains (Table S3). This could to some extent explain the observed variations between Leptolyngbya sensu stricto and other Leptolyngbya-like strains at the genus level. However, the results of phylogenetic and ANI analysis may not be convincing as an ultimately taxonomic assignment given to the polyphyletic traits of Leptolyngbya. Recently, polyphasic approaches successfully separated dozens of Leptolyngbya-like strains as new genera from Leptolyngbya sensu stricto [44,45]. Therefore, reevaluation of these Leptolyngbya-like strains is crucial in the future using polyphasic approaches.
Conclusively, a thorough survey was completed to disclose the patterns of distribution, abundance, density, and diversity of SSRs and CSSRs in genomes of Leptolyngbya sensu stricto and closely related species. The variations observed are consistent with the consensus that SSRs are generally believed to contribute to genome polymorphism [13]. Besides, the variability of SSR may be considered as one of the drivers of genomic plasticity, thereby allowing targeted mutation and evolution. Further, the identified SSRs may provide an evolutionary advantage of fast adaptation to environmental changes and may play an important role in the cosmopolitan distribution of Leptolyngbya strains to globally diverse niches. The current data of SSRs and CSSRs will serve as a prerequisite to facilitate the understanding regarding the genomic distribution, evolution, and functions of SSRs and CSSRs in Leptolyngbya genomes.