The Landscape of Genome-Wide and Gender-Speciﬁc Microsatellites in Indo-Paciﬁc Humpback Dolphin and Potential Applications in Cetacean Resource Investigation

: Microsatellites are one of the important genome characterizations that can be a valuable resource for variety identiﬁcation, genetic diversity, phylogenetic analysis, as well as comparative and conservation genomics research. Here, we developed comprehensive microsatellites through genome-wide mining for the threatened cetacean Indo-Paciﬁc humpback dolphin ( Sousa chinensis ). We found 87,757 microsatellites with 2–6 bp nucleotide motifs, showing that about 32.5 microsatellites per megabase comprises microsatellites sequences. Approximately 97.8% of the markers developed in this study were consistent with the published identiﬁed markers. About 75.3% microsatellites were with dinucleotide motifs, followed by tetranucleotide motifs (17.4%), sharing the same composition pattern as other cetaceans. The microsatellites were not evenly distributed in the S. chinensis genome, mainly in non-coding regions, with only about 0.5% of the markers located in coding regions. The microsatellite-containing genes were mainly functionally enriched in the methylation process, probably demonstrating the potential impacts of microsatellites on biological functions. Polymorphic microsatellites were developed between different genders of S. chinensis , which was expected to lay the foundation for genetic diversity investigation in cetaceans. The speciﬁc markers for a male Indo-Paciﬁc humpback dolphin will provide comprehensive and representative male candidate markers for sex identiﬁcation, providing a potential biomolecular tool for further analysis of population structure and social behavior of wild populations, population trend evaluation, and species conservation management.


Introduction
Microsatellites, or simple sequence repeats (SSRs), are short tandem repeats of 1~6 nucleotides motifs, which are ubiquitous in prokaryotes and eukaryotes. Compared with other genetic molecular markers, such as restriction fragment length polymorphism, random amplified polymorphic DNA, amplified fragment length polymorphism, sequence-related amplified polymorphism and target region amplification polymorphism, microsatellites are characterized by high distribution frequency, codominant inheritance, reproducibility and high polymorphism [1]. Recent studies have demonstrated that microsatellites play important roles in affecting gene activity, chromatin organization, and DNA metabolic processes [2]. In addition, microsatellites are valuable tools for population genetic analyses since they are a rich source of hypervariable codominant markers [3,4]. They are currently the most important codominant markers and have been extensively used in quantitative trait loci (QTL) mapping, genetic diversity studies, marker-assisted selective breeding, and evolutionary studies [1,[5][6][7]. However, the development of novel markers is still a labor-intensive and time-consuming process. Most recently, advances in next generation sequencing (NGS) have sped up microsatellite markers development from the genome level, making it fast and cost-effective compared with traditional methods.
Microsatellites mining has been widely applied and improved in eukaryotic genomes, such as gossypium, prunus, vigna species, red deer, buffalo, common carp, etc. [4,[8][9][10][11][12][13][14]. Furthermore, microsatellites extensively identified across 719 eukaryotes revealed the SSR distribution across evolutionarily related species, including protists, plants, fungi, invertebrates and vertebrates [15]. In cetaceans, valuable microsatellites have been developed and broadly used in the study of cetacean genetic diversity [16][17][18][19][20][21]. Published PCR primer pairs were used to amplify alleles to determine the haplotype relationships and genetic diversity of bottlenose dolphins (Tursiops truncatus) living together in a Japanese aquarium [22]. Nineteen novel tetra-nucleotide microsatellite markers were isolated from the Tursiops aduncus in order to improve genotyping accuracy for applications in large-scale population-wide paternity and relatedness assessments [19]. In addition, microsatellites were also applied to understand the genetic differentiation and speciation processes in Chilean dolphins [21]. However, currently, few SSR markers have been reported for S. chinensis [18,23,24]. Genome-wide characterization of microsatellites is imperative for better species identification and conservation research, particularly for the endangered S. chinensis. The identification and characterization of microsatellites, and thus the establishment of a database, will contribute to the study of genetic diversity and population structure of S. tchinensis and other cetaceans.
The threatened S. chinensis generally inhabit shallow coastal waters and are vulnerable to human activities. However, their population assessments mainly depend on wild shipping surveys, photo identification, and population models assumptions [25][26][27][28]. Here, based on the genome sequences of S. chinensis that we previously published [29], we focused on unraveling the characteristics, distribution, and function effects of S. chinensis microsatellites. The comprehensive molecular markers will provide a useful database and biological tools for genetic diversity assessment, sex identification, population structure analysis and the delineation of management units of S. chinensis. Moreover, common features and composition trends of microsatellites among the cetacean species will be explored. It also has important implications for genetic diversity surveys and the conservation of cetaceans.

Microsatellites Distribution and GO (Gene Ontology) Function Enrichment
As the length of the microsatellites was shorter than the gene length in the genome, the start and end sites of microsatellites were calculated and it was determined whether these sites were located in the gene coding regions based on the genome gene annotation locations. The genes containing microsatellites were obtained and the GO function enrichment was conducted. The GO enrichment methods were as follows: Firstly, mapping all target genes to GO terms in the database (http://www.geneontology.org/ (accessed on 20 December 2017)), then, calculating gene numbers per GO term, and finally, using a hypergeometric test to compare the genomic background to find significantly enriched GO terms in the target genes. The calculating formula was: where N is the number of all genes with GO annotations; n is the number of target genes in N; M is the number of all genes annotated to the certain GO terms; and m is the number of target genes in M. The calculated p-value was Bonferroni corrected, with a p-value < 0.05 as the threshold value. GO terms fulfilling this condition were defined as significantly enriched GO terms in target genes.

Microsatellite Markers Identification
The 46 published SSR sequences of S. chinensis were derived from previous studies. These microsatellites were identified via PCR amplification in S. chinensis. We aligned the microsatellites sequences to the S. chinensis genome sequence with an e-value < 1 × 10 −5 by BLAT (v. 36) (-t = dna, -q = dna, -out = blast8) [33]. The location and motifs of the published markers were compared with our results, and the consistent markers were considered to be verified. Other markers were shorter SSRs, which were verified by searching SSRs with parameters: flank length of SSRs = 20 bp, length of SSRs ≥ 8 bp.

Male S. chinensis Specific Markers Development
As the published male S. chinensis genome has no Y chromosome information, researchers have linked these unknown scaffolds to a chromosome named "chrUN" in the genome [30]. In order to obtain male-specific microsatellites from the S. chinensis genome, we extracted the Y chromosome unlocalized genomic scaffolds (NW_022983135.1, NW_022983136.1, NW_022983137.1, NW_022983138.1) from the published bottlenose dolphin genome (GeneBank: GCF_011762595.1). Subsequently, we aligned the above Y chromosome scaffolds to the published "chrUN" sequences [30] to identify the Y chromosome candidate regions in the male S. chinensis genome.
We separated the "chrUN" chromosome sequences into contigs (32,373 contigs) (chrUN_1~chrUN_32373) by splitting the sequences at the N positions before alignment. Based on the alignment by Mummer (v4.0.0) with default parameters, we filtered the results with a minimum alignment length of 10 kb. Then, we obtained candidate contigs on the Y chromosome and performed male-specific microsatellites searches using the candidate contigs sequences.

Isolation and Characterization of Genomic Microsatellites in S. chinensis
In order to isolate microsatellite sequences, we scanned the S. chinensis genome [29] to identify microsatellites with motif types, such as dinucleotides, trinucleotides, tetranucleotides, pentanucleotides, and hexanucleotides. Around 87,758 microsatellites were detected, with an average frequency of 32.5 microsatellites per mega base (Tables S1 and S2). Among these 87,758 microsatellites, dinucleotides were the most abundant (66,040), accounting for 75.25%, followed by tetranucleotides (15,257; 17.39%), pentanucleotides (2774; 3.16%), hexanucleotides (1961; 2.23%), and trinucleotides (1726; 1.97%) ( Figure 1A, Table  S3). It was observed that the dinucleotides and tetranucleotides were the major microsatellites distributed in the S. chinensis genome ( Figure 1A), which was also identified in other cetacean species [34]. Among the dinucleotide microsatellites, AC/GT motifs (73.03%) were the most recurrent, followed by AT/AT (18.57%), AG/CT (8.37%), and CG/CG (0.03%) motifs (Table S3). AC/GT motifs were the most abundant in the dinucleotide microsatellites ( Figure 1B). On the whole, the composition of the microsatellites of the S. chinensis genome was consistent with that in mammals such as humans, primates, rats, camels, and pigs [35][36][37][38][39][40]. To date, genome-wide microsatellites have been deciphered and mined for many aquatic species such as jellyfish, catfish and eel fish, etc. For instance, 142,616 microsatellites were found in cannonball jellyfish (Stomolophus sp.) by next-generation sequencing, which laid the foundation for further population genetics studies [41]. Similarly, 24 polymorphic microsatellite loci in the Neotropical freshwater fish Ichthyoelephas longirostris were identified, 19 of which were used to assess their genetic diversity and structure in three Colombian rivers of the Magdalena basin [42]. The population structure and genetic diversity in yellow catfish (Pelteobagrus fulvidraco) were assessed using eight microsatellites in combination with capillary electrophoresis [43]. Recently, SSRs markers were detected in swamp eels, and the results showed that dinucleotide microsatellites were the most abundant type, accounting for 71% of all microsatellites loci, whereas the AC-rich motifs were the most abundant type [44]. Moreover, comprehensive microsatellites assays were performed on the genomes of 14 fish species; the abundance and frequency of the repeats were counted, and (AC) and (AT)-rich repeats were found to be dominant in different fish [45]. These results were similar to our observations of S. chinensis in this study. It has also been reported that (AC)n is the most common dinucleotide motif in vertebrate genomes, with a frequency of 2.3 times that of the second most common dinucleotide type (AT)n [46]. This study is the first comprehensive exploration of microsatellites within the genome of S. chinensis and will help facilitate genetic diversity and conservation genetics studies.

The Microsatellites Characteristics Comparison among Cetacean Genomes
B. mysticetus, B. acutorostrata, O. orca, T. truncates, D. leucas, L. vexillifer, and N. asiaeorientalis genomes were applied for microsatellites development (Table S4). The results represented similar characteristics to S. chinensis in terms of microsatellite composition ( Figure 1A). Odontoceti possessed more total dinucleotide microsatellites than Mysticeti ( Figure 1A). Besides, the number of dinucleotide microsatellites with different motifs was also higher in Odontoceti than Mysticeti ( Figure 1A). On the other hand, Mysticeti had more tetranucleotide microsatellites than Odontoceti ( Figure 1A). The total number of microsatellites in cetaceans ranged from 70,000 to 120,000 ( Figure 1A, Table S1). AC/GT and AT/AT-rich repeats were the main types of dinucleotide microsatellites ( Figure 1B, Table S5).

Microsatellites Distribution in the S. chinensis Genome
We investigated the distribution of microsatellites in the genome's coding and noncoding regions. Most microsatellites were located in non-coding regions (87,347; 99.5%), and only about 0.5% (410) were located in coding regions, which was expected, as coding regions represent only a very small fraction of the genome. Based on the former reports, microsatellites were not evenly distributed throughout the whole genome [46][47][48][49]. Numerous microsatellites were distributed in non-coding regions and rarely in coding regions of eukaryotic species [50], e.g., about 7~10% in higher plants [51], and 9~15% in vertebrates [36,52]. In the genome of a swamp eel, about 1% of genes contain microsatellites in the coding regions of the genome [44]. In our study, only 0.8% of the genes contained microsatellites, a lower percentage than in vertebrates such as primates and rabbits [36,52], but one that is close to that of the swamp eel [44].
Next, GO function terms of the genes containing microsatellites in the coding regions were analyzed based on the genome annotation information [29]. Microsatellites were distributed in 195 genes, mainly enriched in 53 GO terms. Methylation processes such as histone acetyltransferase activity (GO: 0004402), histone acetylation (GO: 0016573), histone modification (GO: 0016570), histone methyltransferase complex (GO: 0035097), histone methyltransferase activity (H3-K4 specific) (GO: 0042800), and histone-lysine N-methyltransferase activity (GO: 0018024) were enriched (Table 1). Others were associated with metabolic and biosynthetic process, such as the regulation of the RNA metabolic process (GO: 0051252), the regulation of the nitrogen compound metabolic process (GO: 0051171), the regulation of the primary metabolic process (GO: 0080090), the regulation of the cellular metabolic process (GO: 0031323), the RNA metabolic process (GO: 0016070), etc ( Table 1). The GO term annotation results suggest that the microsatellites were located in functional gene regions and may have impacts on gene function, such as the metabolic and methylation process. DNA methylation is readily detected in mam-mals, which is critical for species' normal development and genome stability. Repetitive sequences have been regarded as 'epigenetic elements' that can regulate gene expression [53,54]. In plants and mammals, a large proportion of methylated cytosines are found in repeat elements [55,56]. The methylation of repeat elements is thought to have been compiled persistently during the evolution process and plays an important role in the epigenetic regulation of genes [57]. Most studies have focused on transposable elements, such as Alu elements (Alu), and long interspersed element-1 (LINE-1) [58,59]. The relationship between methylation and simple repeats still needs to be further studied. Thus, the analysis of genes containing microsatellites and their functions could probably be used to reveal potential mechanisms and characteristics of gene expression and regulation in S. chinensis.

Validation of Microsatellites in S. chinensis
Several accurate and widely used techniques can be applied to identify cetacean species, such as morphology, fingerprinting, and DNA barcoding [60,61]. However, it remains challenging to investigate the genetic diversity of cetaceans, due to the difficulty of obtaining accurate samples, especially for species threatened with extinction. S. chinensis is currently listed as Vulnerable by IUCN. It is also classified as a first-class protected animal by the Chinese National Key Protected Wild Aquatic Animals List. Live animal experiments are illegal and harmful to them. Cetacean stranding provides a unique opportunity to gain biological and ecological knowledge of cetaceans [62]. Previous studies have obtained microsatellites loci from stranded S. chinensis and identified them via PCR amplification [18,24]. We aligned the previous identified microsatellite markers to our genome. Of these total 46 markers, all could be mapped to the genome sequence. Then, we checked the positions and motif types between the published markers and the markers we isolated from the genome. Overall, 39 published markers were found in our markers that had the same location on the genome and the same motif types (+/−), and 7 published markers (SGATA47, SGATA35, SGATA28, SGATA25, SGATA18, SGATA13, SCA30) were not found in our markers at the corresponding position of the genome (Table 2). However, we extracted the sequences of the above seven marker-locatable genomic regions and developed the microsatellites using looser parameters: flank length of SSRs = 20 bp, and length of SSRs ≥ 8 bp in the genome. The results showed that the motif and the repeat numbers of these six microsatellites (※) were the same as those reported, except for SGATA18 (#) ( Table 2). As a result, 45 of 46 markers could be confirmed at the genomic scale, with an overall verification rate of about 97.8%, indicating that the method of microsatellites isolation from the whole-genome sequence was comprehensive and accurate.   [18,24], the corresponding microsatellite IDs, and repeat types extracted from the S. chinensis genome. +/− denotes forward and reverse mapping in the genome and present in the extracting markers; ※ denotes the published markers could be mapped to the genome but not included in our results, after which these sequences were checked and found to be shorter SSRs (these markers can be isolated from the genome if looser parameters are set when scanning the genome sequence with MISA (v1.0) software); # denotes the markers could be mapped to the genome, but not included in our isolated microsatellites, even when changing the search parameters.

Polymorphic Microsatellites Detection Based on Male and Female Markers in S. chinensis
We reported the potential polymorphic microsatellites between male and female individuals. These markers may play an important role in the gender evolution or may be associated with sex chromosomes in mammals [63]. These microsatellites may also provide candidate markers for assessing population structure (male to female ratio) and the social behavior of S. chinensis. A total of 86,432 microsatellites were developed based on the male S. chinensis genome [30] (Table S6). Among these markers, 65,571 (75.87%) were dinucleotides microsatellites, followed by 14,677 (16.98%) tetranucleotides (Table S7). There were 15,678 candidate polymorphic microsatellites from the female and male markers, implying the presence of different PCR lengths and repeat numbers in both genomes (Table S8). Among them, 14,377 (91.71%) polymorphic microsatellites were dinucleotides, followed by 1063 (6.77%) tetranucleotides, 108 (0.69%) trinucleotides, 68 (0.43%) pentanucleotides, and 62 (0.40%) hexanucleotides (Table S9). The same trend of microsatellite composition was found in female and male S. chinensis. Dinucleotides and tetranucleotides were the predominant motif types among all the microsatellites isolated from the cetacean genome sequences ( Figure 1A). These markers are of great value for conducting population surveys of S. chinensis in the field.

Specific Markers of Male S. chinensis
Until now, no cetacean genome has been found to have intact individual Y chromosome (a single and complete Y chromosome) assembled sequences. Only the bottlenose dolphin has the candidate Y chromosome-associated scaffolds. The bottlenose dolphin is a close relative of the S. chinensis [64]. Therefore, to obtain Y chromosome-specific microsatellites in the genome of male S. chinensis, we aligned unanchored scaffold sequences other than autosomal and X chromosome sequences in male S. chinensis with candidate Y chromosome-associated scaffold sequences of bottlenose dolphins. We obtained candidate contigs on the Y chromosome of S. chinensis genome based on the alignment results. This is the first time that the Y chromosome region sequence of this protected species has been identified. Finally, 1349 microsatellites were isolated from the candidate contigs, which can be applied as candidate markers for male dolphins (Table S10). These markers will have promising applications in the sex identification of S. chinensis after testing on additional samples. Since cetaceans have highly conserved genomes, these male-specific markers will also be important in the sex ratio investigation in S. chinensis and other cetaceans.

Conclusions
We explored and identified useful microsatellite markers in S. chinensis from a genomewide scale and designed primers for these markers. A comparison with the published marker sequences revealed that the markers developed in this study had high reliability. Microsatellites in the S. chinensis genome are mainly distributed in the non-coding regions, and only approximately 0.5% are distributed in coding regions. Their associated genes presented potential methylation impacts in dolphins. Dinucleotide repeat motif type microsatellites are the most abundant, and this motif type is dominated by the AC-rich type. Cetacean species share the same microsatellite composition characteristics, which possibly reveals the conserved nature of the cetacean genomes. The study also yielded polymorphic microsatellite data for male and female S. chinensis, which provided a valuable database for the investigation and demography of dolphin populations. Moreover, the male-specific marker mining provided useful candidate markers for the sex identification of S. chinensis, as well as other cetacean species. Given that many marine mammals are species at risk of extinction, there is an urgent need to develop diverse conservation tools. The extensive and reliable genome-wide microsatellite information we identified in this study is expected to expand to other cetacean species and will play an important role in their population genetics studies and conservation.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/jmse10060834/s1, Table S1: The composition and classification of microsatellites in cetaceans;   Institutional Review Board Statement: As we only used the published genome sequences data for the analysis, the study did not require ethical approval. We choose to exclude this statement.

Informed Consent Statement:
We choose to exclude this statement because the study did not involve humans.

Data Availability Statement:
We used the published data for the analysis in this study. The data resources were explained and described in the manuscript. The results and maker sequences created by this study have all been attached in the supplementary materials. We choose to exclude this statement.