Microsatellite Characteristics of Silver Carp (Hypophthalmichthys molitrix) Genome and Genetic Diversity Analysis in Four Cultured Populations

Hypophthalmichthys molitrix is one of the four most important fish in China and has high breeding potential. However, simple sequence repeat (SSR) markers developed on H. molitrix genome level for genetic diversity analysis are limited. In this study, the distribution characteristics of SSRs in the assembled H. molitrix genome were analyzed, and new markers were developed to preliminarily evaluate the genetic diversity of the four breeding populations. A total of 368,572 SSRs were identified from the H. molitrix genome. The total length of SSRs was 6,492,076 bp, accounting for 0.77% of the total length of the genome sequence. The total frequency and total density were 437.73 loci/Mb and 7713.16 bp/Mb, respectively. Among the 2–6 different nucleotide repeat types, SSRs were dominated by di-nucleotide repeats (204,873, 55.59%), and AC/GT was the most abundant motif. The number of SSRs on each chromosome was positively correlated with the length. The 13 pairs of markers developed were used to analyze the genetic diversity of four cultivated populations in Hubei Province. The results showed that the genetic diversity of the four populations was low, and the ranges of alleles (Na), effective alleles (Ne), observed heterozygosity (Ho), and Shannon’s index information (I) were 3.538–4.462, 2.045–2.461, 0.392–0.450, and 0.879–0.954, respectively. Genetic variation occurs mainly among individuals within populations (95.35%). UPGMA tree and Bayesian analysis showed that four populations could be divided into two different branches. Therefore, the genome-wide SSRs were effectively in genetic diversity analysis on H. molitrix.


Introduction
Silver carp (H. molitrix), as one of the four dominant fish in China, are mainly fed on phytoplankton and live in the upper and middle layers of water. They are widely distributed in ponds, lakes, rivers, and other major freshwater ecosystems in China and Asia [1]. Due to its fast growth, low breeding cost, high economic benefit, and purification role of water quality, it has become an important freshwater economic fish [2]. Before the implementation of the ten-year fishing ban in the Yangtze River, more attention was paid to the analysis of genetic diversity and genetic structure of natural populations, and less attention was paid to the genetic background of breeding populations. Moreover, breeding populations might affect the genetic diversity and adaptability of natural populations through the hatchery release of H. molitrix [3]. Therefore, it is necessary to further develop effective genetic markers to evaluate the genetic diversity of H. molitrix breeding populations.

Sample Collections and DNA Extraction
One hundred and twenty samples of four cultured H. molitrix populations were collected from Shishou (SS), Wuhan (WH), Xiaochang (XC), and Yaowan (YW) ( Table 1), respectively, in Hubei Province in 2021. The tail fins were sampled and stored in anhydrous ethanol at −20 • C. Genomic DNA of samples were extracted using a high-salt method [16]. After extraction, the quality of the DNA was detected by 1% agarose gel electrophoresis and a UV gel imaging system. DNA concentrations were measured by NanoPhotometer®spectrophotometer (IMPLEN, München, Germany) and diluted with sterile double-distilled water to 50 ng/µL (Table 1).

Identification of Genome-Wide SSRs
MISA 2.1 software (Leibniz institute, IPK, Germany. http://pgrc.ipk-gatersleben.de/ misa/ (accessed on 16 April 2021)) was used to search SSRs in the H. molitrix genome. SSR screening criteria were as follows: di-nucleotide repeats more than 6 times, tri-nucleotide repeats more than 5 times, tetra-nucleotide repeats more than 4 times, and penta-and hexanucleotide repeats more than 3 times. Compound SSRs were defined as the interval between two repeat motifs less than 100 bp. Due to the principle of complementary base pairing, the same kind of repetitive SSRs were merged as a repetitive representation. Di-nucleotide AC (AC/TG/CA/GT), tri-, tetra-, penta-, and hexa-nucleotides follow the same principles.

Primer Design for Genome-Wide SSRs
SSR primers were designed using the Primer 3.0 software according to the flanking sequence of SSRs. The design principles of primers were as follows: the primer sequence from the core sequence was 50~80 bases, the PCR amplification product was from 100 to 400 bp, and the annealing temperature was from 50 • C to 60 • C. GC content ranged from 40% to 60%.

Verification of SSRs Using PCR Amplification
Ninety-six pairs of SSR primers with three bases and above were designed and synthesized by Tianyi Huiyuan Biotech Company, Wuhan, China (Table S1). The PCR reaction system contained 5.0 µL 2 × Taq PCR Master Mix, 1 µL template DNA (20 ng/µL), 0.5 µL of each primer (10 µL/mol), and DNase-/RNase-free deionized water 3.0 µL. Two-stage amplification programs were used. In the first stage, the pre-denaturation at 95 • C for 5 min caused the annealing temperature to gradually decrease from 62 • C to 52 • C, with a total of 10 cycles. The second stage included 25 amplification cycles, and the annealing temperature was 52 • C. In these two stages, the denaturation and extension steps remained unchanged for 30 s at 95 • C and 72 • C, respectively. After the second stage, the final extension was carried out at 72 • C for 20 min. Ninety-six pairs of primers were selected for PCR amplification and detected by 1% agarose gel electrophoresis. Finally, 13 SSR markers were obtained ( Table 2). The PCR products were subjected to SSR analysis on an ABI 3730xl instrument, and then the genotype data were read using GeneMarker (Applied Biosystems).  [17] was used to calculate the number of alleles (Na), the number of effective alleles (Ne), expected heterozygosity (He), observed heterozygosity (Ho), Shannon's index information (I), and Nei's genetic distance. The polymorphism information content (PIC) was calculated by Cervus 3.0 software (Kruuk, Australian National University, Australian) [18]. The genetic differentiation index (Fst) of each population was calculated using Arlequin version 3.5 [19] and the molecular variance analysis (AMOVA) was performed.
The phylogenetic tree was constructed based on Nei's genetic distance and an unweighted pair-group method with arithmetic mean using MEGA 5.0 [20]. Structure v2.3.4 [21] was used to evaluate the genetic relationship between populations. Based on the Bayesian model, the clustering value (K value) was found based on the hybrid model. The length of the burn-in period at the beginning of Markov Chain Monie Carfo (MCMC) was set to 50,000 times, and the range of K value was set to 1-8. Each K value was repeated 20 times. The analysis results were submitted to Structure Harvester (http://taylor0.biology.ucla.edu/struct harvest/ (accessed on 13 April 2022)) to determine the best K value, and then CLUMPP1.1.2 software (Rosenberg, Oxford University Press, USA) [22] was used for repeated clustering analysis. Finally, DISTRUCT1.1 [23] was used for visualization.

Identification of SSRs in the H. molitrix Genome
A total of 368,572 SSR repeats were screened in the 842.01 Mb genome of H. molitrix. The total length of the identified SSRs was 6,492,076 bp, accounting for 0.77% of the total length of the whole genome. The average length of SSRs was 84.66 bp, the frequency was 437.73 loci/Mb, and the density was 7713.16 bp/Mb (Table 3)  Among different repeat types, AC (89,924) had the largest number of di-nucleotide repeats, accounting for 43.89%, followed by AT and AG, which were 40.6% and 15.4%, respectively. CG had the lowest proportion (0.1%). The highest frequency type of trinucleotide is AAT (24,409), accounting for 64.15%, followed by AAC (12.34%), AAG (5.36%), and the remaining repetitive sequences are relatively few. Among the tetra-, penta-, and hexa-nucleotide repeats, the highest number was AAAT (24%), AAAAT (28.2%), and AAAAAT (31.6%), respectively ( Figure 1).

The Distributions of Copy Numbers in Different SSR Repeat Types in H. molitrix Genome
In the H. molitrix genome, the copy number of di-nucleotide repeats ranged from 6 to

The Distributions of Copy Numbers in Different SSR Repeat Types in H. molitrix Genome
In the H. molitrix genome, the copy number of di-nucleotide repeats ranged from 6 to 29 times, which accounted for 95.74% of the total di-nucleotide SSRs, and 6 repeat types were the most abundant (47,731) and accounted for 23.3%. The copy number of trinucleotide repeats was mainly concentrated in 5-10 times, accounting for 97.72% of the total number of tri-nucleotide SSRs, of which 5 repeats were the most (18,401), accounting for 48.36% of the total. The copy number of tetra-nucleotide repeats was largely concentrated in 4-8 times, accounting for 95.64% of the total number of tetra-nucleotide SSRs, of which 4 repeats were the most (35,997), accounting for 51.42% of the total number. The copy number of penta-nucleotide repeats was mostly concentrated in 3-5 times, accounting for 96.37% of the total number of penta-nucleotide SSRs, of which 3 repeats (33,442) accounted for 74.45%. The 3-5 times copy number of hexa-nucleotide repeats dominated, which accounted for 98.9% of the total hexa-nucleotide SSRs. Among them, the number of three repeats were the most (9683), accounting for 90.34% of the total number ( Figure 2, Table S3).  Table S3).

Distribution of SSRs on Chromosomes
The total length of 24 assembled chromosomes in H. molitrix accounted for 95.87% of the total assembled sequences. A total of 260,712 SSRs were screened, of which the largest number (15,637) was located in chromosome 1, accounting for 6%, followed by chromosome 2 (15,610) and chromosome 4 (14,548), accounting for 5.98% and 5.58%, respectively. The number of SSRs on chromosome 24 was the least (7650), accounting for 2.93% ( Figure 3, Table S4). Linear regression analysis was performed using SPSS, and the results showed that the total number of SSRs was positively correlated with chromosome length (R = 0.969, p < 0.01). The

Distribution of SSRs on Chromosomes
The total length of 24 assembled chromosomes in H. molitrix accounted for 95.87% of the total assembled sequences. A total of 260,712 SSRs were screened, of which the largest number (15,637) was located in chromosome 1, accounting for 6%, followed by chromosome 2 (15,610) and chromosome 4 (14,548), accounting for 5.98% and 5.58%, respectively. The number of SSRs on chromosome 24 was the least (7650), accounting for 2.93% (Figure 3, Table S4). Linear regression analysis was performed using SPSS, and the results showed that the total number of SSRs was positively correlated with chromosome length (R = 0.969, p < 0.01). The

Screening of Polymorphic SSR Sites
A total of 56 alleles were detected in 13 SSR markers, observed number of alleles (Na) ranged from 2 to 7, effective numbers of alleles (Ne) ranged from 1.052 to 4.765, observed heterozygosity (Ho) ranged from 0.017 to 0.683, expected heterozygosity (He) ranged from 0.049 to 0.800, Shannon's index information (I) ranged from 0.133 to 1.747, and polymorphism information content (PIC) ranged from 0.048 to 0.768 (Table 5).

Screening of Polymorphic SSR Sites
A total of 56 alleles were detected in 13 SSR markers, observed number of alleles (Na) ranged from 2 to 7, effective numbers of alleles (Ne) ranged from 1.052 to 4.765, observed heterozygosity (Ho) ranged from 0.017 to 0.683, expected heterozygosity (He) ranged from 0.049 to 0.800, Shannon's index information (I) ranged from 0.133 to 1.747, and polymorphism information content (PIC) ranged from 0.048 to 0.768 (Table 5).

Population Genetic Diversity Analysis
Na in four H. molitrix populations ranged from 3.538 (YW) to 4.462 (XC), with an average of 4.116; Ne ranged from 2.045 (YW) to 2.461 (WH), with an average of 2.307; Ho ranged from 0.392 to 0.450, with an average of 0.4138; He ranged from 0.402 (XC) to 0.504 (WH) with an average of 0.457, and the mean value of He was greater than that of Ho, indicating that the proportion of homozygotes was greater than that of heterozygotes. Shannon's index information (I) ranged from 0.879 to 0.954, with an average of 0.911. The Fixation Index (Fst) in the group was between 0.084 (SS) and 0.178 (YW) ( Table 6).

Genetic Differentiation in Four Populations
Analysis molecular of variance (AMOVA) was used to detect genetic differentiation among populations. The results showed that 95.35% of genetic variation was within populations, while only 4.65% was among populations. The Fst value among populations was 0.04650 (p < 0.001) ( Table 7). Genetic differentiation index between YW and WH populations was the highest, while it was the lowest between XC and SS (Table 8). Based on Nei's genetic distance, a population phylogenetic tree was constructed by the UPGMA method. The cluster results showed that SS, XC, and WH populations were clustered into one clade, and the YW population was clustered in another ( Figure 4). A structure harvester was used to determine the best k value of 2, and it is inferred that the four populations can contain all individuals with the greatest possibility (Table S5). Different colors in the clustering diagram represented different groups. The results showed that the YW population was significantly different from the SS, WH, and XC populations, and it was divided into two different groups ( Figure 5).
Studies have shown that the longer the species evolution, the more SSR repeats of low repeat units there are. [34]. The dominant repeat type of SSRs in the H. molitrix genome is a di-nucleotide, which is consistent with that of most aquatic organisms, such as Pelteobagrus fulvidraco [35], Hemibagrus wyckioides [36], and L. maculatus [31]. This may be related to the evolutionary time of species.
Among the di-nucleotide repeat types, AC/GT had the largest number, which was consistent with most vertebrates [37]. The distribution of SSRs in different species exhibits certain differences, but the G/C bases in the genome are generally low [26]. AAT/ATT, AAAT/ATTT, AAAAT/ATTTT, and AAAAAT/ATTTTT are the dominant tri-, tera-, penta-, and hexa-nucleotide repeat types, respectively. A/T bases accounted for the majority of SSRs in the whole genome, while G/C content was less. This is similar to the results of H. wyckioides [36], C. carpio [38], and Misgurnus anguillicaudatus [39]. It was speculated that the CpG di-nucleotide sequence of cytosine (C) usually methylated, and then went through deamination to generate thymine (T) [40]. In addition, sequences containing A/T were prone to base sliding during replication, and G/C content was negatively correlated with the probability of replication sliding [41].
Studies have shown that the longer the species evolution, the more SSR repeats of low repeat units there are. [34]. The dominant repeat type of SSRs in the H. molitrix genome is a di-nucleotide, which is consistent with that of most aquatic organisms, such as Pelteobagrus fulvidraco [35], Hemibagrus wyckioides [36], and L. maculatus [31]. This may be related to the evolutionary time of species.
Among the di-nucleotide repeat types, AC/GT had the largest number, which was consistent with most vertebrates [37]. The distribution of SSRs in different species exhibits certain differences, but the G/C bases in the genome are generally low [26]. AAT/ATT, AAAT/ATTT, AAAAT/ATTTT, and AAAAAT/ATTTTT are the dominant tri-, tera-, penta-, and hexa-nucleotide repeat types, respectively. A/T bases accounted for the majority of SSRs in the whole genome, while G/C content was less. This is similar to the results of H. wyckioides [36], C. carpio [38], and Misgurnus anguillicaudatus [39]. It was speculated that the CpG di-nucleotide sequence of cytosine (C) usually methylated, and then went through deamination to generate thymine (T) [40]. In addition, sequences containing A/T were prone to base sliding during replication, and G/C content was negatively correlated with the probability of replication sliding [41].
Studies have shown that the longer the species evolution, the more SSR repeats of low repeat units there are. [34]. The dominant repeat type of SSRs in the H. molitrix genome is a di-nucleotide, which is consistent with that of most aquatic organisms, such as Pelteobagrus fulvidraco [35], Hemibagrus wyckioides [36], and L. maculatus [31]. This may be related to the evolutionary time of species.
Among the di-nucleotide repeat types, AC/GT had the largest number, which was consistent with most vertebrates [37]. The distribution of SSRs in different species exhibits certain differences, but the G/C bases in the genome are generally low [26]. AAT/ATT, AAAT/ATTT, AAAAT/ATTTT, and AAAAAT/ATTTTT are the dominant tri-, tera-, penta-, and hexa-nucleotide repeat types, respectively. A/T bases accounted for the majority of SSRs in the whole genome, while G/C content was less. This is similar to the results of H. wyckioides [36], C. carpio [38], and Misgurnus anguillicaudatus [39]. It was speculated that the CpG di-nucleotide sequence of cytosine (C) usually methylated, and then went through deamination to generate thymine (T) [40]. In addition, sequences containing A/T were prone to base sliding during replication, and G/C content was negatively correlated with the probability of replication sliding [41].
The number of SSR repeat copies in the H. molitrix genome was mainly concentrated between 3 and 29. The repeat number gradually decreased with the increase in the number of repeat unit copies, which was consistent with the distribution of SSRs in most genomes. The number of SSR repeats decreased with the increase in repeat length, because the longer the repeat length, the higher the possibility of mutation [42]. A large number of studies have shown that the number of SSRs on different biological chromosomes is correlated with their length. Linear analysis showed that the chromosome length of H. molitrix was positively correlated with the number of SSRs (R = 0.969, p < 0.01). The longer the chromosome, the higher the microsatellite content [43]. The frequency and density of SSRs was not correlated with chromosome length, which was relevant to the long-term evolution of species in 14 fish species [30].
Genetic diversity is an important genetic index to evaluate population adaptability, which can be estimated by the observed number of alleles, effective numbers of alleles, observed heterozygosity, and expected heterozygosity [44,45]. Many studies have shown that unintentional parental selection and inbreeding in the process of reproduction can lead to a decrease in the genetic diversity of populations. Our present results showed that the average values of Na, Ne, Ho, He, and I were 3.538-4.462, 2.045-2.461, 0.392-0.450, 0.402-0.5, and 0.879-0.954, respectively, which indicated that the genetic diversity among the four populations was low. This is consistent with [46] in Guangxi-cultivated H. molitrix populations. Therefore, the genetic diversity of the four cultivated H. molitrix populations in this experiment is low, which is very unfavorable to the protection of germplasm resources, and more scientific breeding measures should be carried out in the process of reproduction to improve the genetic diversity.
Fst is usually used to evaluate genetic differentiation among populations [47]. When 0 < FST < 0.05, there was no differentiation among populations, when 0.05 < FST < 0.15, there was moderate differentiation between groups, and when 0.25 < FST < 1, there was a high differentiation between groups [48]. Among these populations, the Fst values of the YW population and the other three populations (SS, WH, and XC) were 0.05402, 0.08709, and 0.06319, respectively. The YW population exhibited moderate differentiation from the other three populations. This is consistent with the results of the UPGMA tree and cluster diagram.

Conclusions
In conclusion, this study analyzed the number, frequency, distribution, and type of SSRs in the whole genome of H. molitrix, screened and developed SSR markers at the level of the whole genome for the first time, and then analyzed the genetic diversity of four breeding populations. It was found that the genetic diversity of these four populations was low. Therefore, developing new SSR markers from the H. molitrix genome will provide a basis for genetic diversity analysis, the formulation of more scientific breeding measures, the protection and development of germplasm resources, and the realization and development of a sustainable aquaculture.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13071267/s1, Table S1: 96 pairs of primer information, Table S2: Summary of SSR motifs and repeats, Table S3: The distributions of copy numbers in different SSR repeat types in the H. molitrix genome, Table S4: Microsatellite distribution characteristics on 24 chromosomes, Figure S1: The curve of change of Delta K with the changing K value.

Institutional Review Board Statement:
The animal study was approved by the animal care regulations of the Yangtze River Fisheries Research Institute, Chinese Academy of Fishery Sciences (approval number 2020098).

Informed Consent Statement: Not applicable.
Data Availability Statement: The experimental data involved in this article can be obtained by the corresponding author.