Development of Novel SNP Assays for Genetic Analysis of Rare Minnow ( Gobiocypris rarus ) in a Successive Generation Closed Colony

: The complex genetic architecture of closed colonies during successive passages poses a signiﬁcant challenge in the understanding of the genetic background. Research on the dynamic changes in genetic structure for the establishment of a new closed colony is limited. In this study, we developed 51 single nucleotide polymorphism (SNP) markers for the rare minnow ( Gobiocypris rarus ) and conducted genetic diversity and structure analyses in ﬁve successive generations of a closed colony using 20 SNPs. The range of mean H o and H e in ﬁve generations was 0.4547–0.4983 and 0.4445–0.4644, respectively. No signiﬁcant di ﬀ erences were observed in the N e, H o, and H e ( p > 0.05) between the ﬁve closed colony generations, indicating well-maintained heterozygosity. The F -statistics analysis revealed a relatively stable genetic structure of the closed colony. Furthermore, the genetic distance between the newer and older generations increased with the breeding generations in closed colonies. Our results conﬁrmed previous ﬁndings in the same samples using microsatellite markers. The results will be beneﬁcial for establishing genetic variability monitoring criteria and restoration of the wild population of the rare minnow and other laboratory ﬁsh. generations. We also compared the di ﬀ erences in the results of SNP and SSR marker analyses in the same samples. Our results may provide useful information to establish genetic variability control standards for rare minnow and may serve as a beneﬁcial reference for other laboratory ﬁsh.


Introduction
Research using laboratory animals requires explicit knowledge of their genetic background [1]. At present, technologies that monitor genetic variability in major closed colony organisms are mainly based on morphology, biochemical markers, or microsatellite markers (SSRs) [2][3][4]. Moreover, the genetic information provided by morphological and biochemical markers is limited; hence, they are not used to identify single base mutations. Owing to the difficultly in comprehensively evaluation the genetic background of laboratory animal, molecular marker technology have been introduced into the study of genetics, such as SSRs, which can provide sufficient genetic information, and have become available for detection of genetic diversity of laboratory animals [5]. However, as a third-generation DNA molecular marker technology and an excellent tool for population based genetic research [6,7], the research of using single nucleotide polymorphisms (SNPs) in genetic variability evaluation of closed colony is still inadequate. Advantages of SNPs include genetic stability, high polymorphism,

SNP Prediction and Sequence Annotation
Raw reads of the rare minnow transcriptome were downloaded from GenBank and mapped to the reference genome (unpublished) of rare minnow using the Bowtie software [31]. Potential SNPs were identified by manual methods (eye inspection of the alignments). Sequences 2000 bp in length were captured around the predicted SNP loci from the reference genome for SNP validation.
All captured sequences were aligned to those in the Nt database with the BLASTN program to retrieve genes with the highest sequence similarity to the captured sequences along with their gene function annotations. All sequences were deposited in GenBank under the project PRJNA658832.
All primers were designed using Primer Premier 5.0 software (PREMIER Biosoft International, Palo Alto, CA, USA) to minimize self-and cross-dimerization, and hairpin formation. The primer sequence, annealing temperature (Tm), length, and position are shown in Supplementary Table S1.

SNP Validation and Genotyping
Six wild rare minnow samples were used to validate the SNPs by SNaPshot methods. A multiplex PCR reaction was performed in a 20 µL reaction volume comprising 50 ng of DNA template, 7.5 µL 2 × PCR master mix, and 2 µL multiplex primers (10 µM). The following amplification cycle was used: initial denaturation at 94 • C for 3 min, followed by 35 cycles of denaturation at 94 • C for 3 min, followed by 35 cycles of denaturation at 94 • C for 15 s, annealing at 55 • C for 15 s, and extension at 72 • C for 30 s, with a final extension at 72 • C for 3 min.
Samples were purified post PCR to remove primers and unexhausted dNTPs by ExoI and FastAP enzymes. The reaction mixture comprised 4 µL PCR product, 0.2 µL ExoI, 0.8 µL FastAP, 0.7 µL ExoI buffer, and 4.7 µL ddH 2 O. The mixture was incubated at 37 • C for 15 min, and denatured at 80 • C for 15 min. Finally, an extension reaction to complete the PCR amplification was performed and, the reaction mixture comprised 2 µL PCR product, 1 µL SNaPshot Mix, and 1 µL extension primers (10 µM). The amplification cycle was as follows: denaturation at 96 • C for 1 min, followed by 30 cycles of denaturation at 96 • C for 10 s, annealing at 52 • C for 5 s, and extension at 60 • C for 30 s. Sample analysis was performed on an ABI 3730XL DNA Analyzer (Applied Biosystems, Foster City, CA, USA). Twenty highly variable SNPs were selected for further genotyping of five closed colonies using the above-mentioned method.

Genetic Analysis
The effective alleles (Ne), observed heterozygosity (Ho), expected heterozygosity (He), F-statistics, and genetic distance were calculated using Popgene version 1.32 for each SNP locus. Polymorphic information content (PIC) was calculated using the formula described by Botstein et al. [32]. Hardy-Weinberg equilibrium for each SNP locus was analyzed using the Chi-square test among the six groups. A genetic fingerprint profile was drawn by Heml version 1.0.3.7. To depict divergence among the five closed generations and the wild collection, a phylogenetic tree was constructed by the UPGMA method using Popgene version 1.32 [32] and exported by Mega 5.1 software [33].

SNP Identification and Annotation
Fifty sequences containing potential SNPs were identified using bioinformatic methods. Six wild rare minnow samples were used to validate the 50 sequences by SNaPshot methods. A total of 51 SNPs in 29 sequences were confirmed (Supplementary Table S1). Among the 29 sequences, 14 sequences contained a single SNP, whereas 15 sequences contained two or more SNPs (Supplementary Figure S1). Notably, five SNPs were identified in Seq31. All confirmed SNPs were biallelic except Seq26-2, which had three alleles.
We used BLAST to annotate the SNPs; 33 SNPs were annotated with a few SNPs in important functional genes (Supplementary Table S1).

Genetic Diversity of Closed Colony and Wild Samples
We selected 20 highly variable SNPs among the 51 SNPs to analyze the genetic diversity among the five closed colonies and the wild group using SNaPshot methods. The five generations of the closed colony were F1, F3, F4, F5, and F10. Among the six groups, all SNPs were moderately or highly polymorphic (PIC > 0.25). The range of the effective number of alleles (Ne), mean observed heterozygosity (Ho) and mean expected homozygosity (He) in the six groups was 1.8183-1.8745, 0.4333-0.4983, and 0.4445-0.4758, respectively (Table 1). Hardy-Weinberg equilibrium for each SNP locus was analyzed using the Chi-square test among the six groups. The mean range of p-value of Hardy-Weinberg equilibrium in the six groups was 0.3716-0.5415 (Table 2), indicating random mating and no-evidence for inbreeding. The frequency of all 20 SNP loci in the wild and the F1 generation conformed to the Hardy-Weinberg equilibrium. There was one locus each in the F5 and F10 generations, two loci in the F4 generation, and three loci in the F3 generation that deviated from Hardy-Weinberg equilibrium (Table 2).
The range of mean Ne, Ho, and He in F1, F3, F4, F5, and F10 was 1.8183-1.8745, 0.4547-0.4983, and 0.4445-0.4644, respectively ( Table 1). The F4 presented the highest mean Ne (1.8745) and mean He (0.4644), whereas the F5 generation presented the highest mean Ho (0.4983). There were no significant differences among the five closed colonies for genetic parameters of Ne, Ho, and He (p > 0.05), in the one-way ANOVA (Figure 2), indicating that heterozygosity in IHB was well maintained.
F-statistics (F IS , F IT , and F ST ) analysis of the five rare minnow closed colonies demonstrated positive F IS values for nine SNP loci, and positive F IT values for seven SNP loci ( Table 3). The mean F IS and F IT values were −0.2659 and −0.0631, respectively, with negative values indicating low inbreeding within a closed colony. The mean F ST value of the five closed colonies was 0.0146 (Table 3), whereas F ST between generations ranged from 0.0032 to 0.0163 (Table 4), and the mean F ST was 0.0146, the F ST values indicated low genetic differentiation between the five successive generations.   The range of mean Ne, Ho, and He in F1, F3, F4, F5, and F10 was 1.8183-1.8745, 0.4547-0.4983, and 0.4445-0.4644, respectively ( Table 1). The F4 presented the highest mean Ne (1.8745) and mean He (0.4644), whereas the F5 generation presented the highest mean Ho (0.4983). There were no significant differences among the five closed colonies for genetic parameters of Ne, Ho, and He (p > 0.05), in the one-way ANOVA (Figure 2), indicating that heterozygosity in IHB was well maintained.  Table 3). The mean FIS and FIT values were −0.2659 and −0.0631, respectively, with negative values indicating low inbreeding within a closed colony. The mean FST value of the five closed colonies was 0.0146 (Table  3), whereas FST between generations ranged from 0.0032 to 0.0163 (Table 4), and the mean FST was 0.0146, the FST values indicated low genetic differentiation between the five successive generations. The UPGMA phylogenetic tree of the five closed generations and wild fish, based on genetic distances (Table 4), suggested the distinction of the F10 (Figure 3). With breeding generations in closed colonies, the genetic distance between the older and newer generations tended to increase. The UPGMA phylogenetic tree of the five closed generations and wild fish, based on genetic distances (Table 4), suggested the distinction of the F10 (Figure 3). With breeding generations in closed colonies, the genetic distance between the older and newer generations tended to increase. With the allele information of the 20 SNP loci, the Heml software was used to draw the genotype distribution profile heat map of the five IHB closed colonies. Clustering analysis showed the grouping of the 4 SNPs as an independent branch separated from the other 16 SNPs (Figure 4).

Differences in the Results of the SNP and SSR Marker Analyses in the Same Samples
Previously, we analyzed the genetic structure of the F1-F5 IHB generations using the SSR markers (12 loci) [34]. Here, we compared the results of SNP and SSR marker analysis in the same samples (Table 5) and found consistent results between the two methods (F1, F3, F4, and F5). There were no significant differences in the genetic parameters of Ne, Ho, and He among the four closed colonies (p > 0.05). The genetic distance analysis showed similar results, and with the generation of IHB, the genetic distance between older and newer generations gradually increased.

Characteristics of SNP Markers in Rare Minnow
Compared with traditional model animals, basic genomic information of newly developed model organisms is limited. As a newly emerging laboratory fish, the absence of genome assembly

Differences in the Results of the SNP and SSR Marker Analyses in the Same Samples
Previously, we analyzed the genetic structure of the F1-F5 IHB generations using the SSR markers (12 loci) [34]. Here, we compared the results of SNP and SSR marker analysis in the same samples (Table 5) and found consistent results between the two methods (F1, F3, F4, and F5). There were no significant differences in the genetic parameters of Ne, Ho, and He among the four closed colonies (p > 0.05). The genetic distance analysis showed similar results, and with the generation of IHB, the genetic distance between older and newer generations gradually increased.

Characteristics of SNP Markers in Rare Minnow
Compared with traditional model animals, basic genomic information of newly developed model organisms is limited. As a newly emerging laboratory fish, the absence of genome assembly of the rare minnow has notably hampered the development of genetic markers [35][36][37]. Moreover, there are no SNP markers for the rare minnow. To the best of our knowledge, the present study is the first to identify 51 SNP markers in 29 rare minnow genome sequences, with about 50% sequences containing two or more SNP loci, and five SNPs identified in Seq31, reflecting an abundance of SNPs in the rare minnow genome. Most SNPs in the vertebrate genome are biallelic; however, due to both transition and transversion mutations at CpG dinucleotide [38], triallelic SNPs are observed as well. A triallelic SNP (A/C/T) in locus of Seq26-2 was identified and triallelic SNP was observed in other fish [39]. BLAST could annotate 33 SNPs, and the remaining SNPs may be located in an intron or regulatory region. The annotated SNPs represent some critical genes including forkhead box P2, MyoD family inhibitor domain-containing protein, and limb development membrane protein. SNPs located in coding regions may provide valuable insights for further functional gene research in the rare minnow.

Relationship between Heterozygosity of Genetic Markers and Population History
According to the classic definitions of laboratory animals' genetic background, the maintenance of limited genetic diversity is an essential requirement of a closed colony [40]. Heterozygosity has been introduced to genetic diversity research as a core indicator of genetic diversity [34,41,42]. The value of heterozygosity exhibits differences due to the use of different methods, such as genetic analysis using SNPs and SSRs between wild and seven cultured samples of pacific oyster (Crassostrea gigas) demonstrated lower mean heterozygosity using SNPs than with SSRs [43]. A similar result was obtained in wild pacu (Piaractus mesopotamicus), wherein the mean heterozygosity determined with SNPs and SSRs was 0.423 and 0.612, respectively [44]. In this study, we analyzed successive generations of IHB using SNPs and the range of heterozygosity was 0.4445-0.4983, which was lower than that reported in a previous study using SSRs [34]. The population heterozygosity was closely related to the type of genetic marker. For the inherent differences between SNPs and SSRs, SNPs were often exhibited diallelic, whereas SSRs may contain dozens of alleles. The SSR mutation rate per locus is higher than that of an SNP (10 −6 to 10 −2 compared with 10 −9 ). Hence, the stability of SNP is higher than that of SSR during DNA replication [45], leading to lower heterozygosity of SNP markers than SSR markers. Furthermore, discrepancies can also be observed using the same method, likely resulting from the selection of different genetic markers. The location of the marker in genome could influence the stability of the locus to some extent, as the mutation rate of a locus may decrease because of the natural selection pressure [46], and the representativeness of markers at different chromosomes should be considered. Therefore, it is necessary to consider the marker type and marker representativeness before evaluation of genetic variability.
The background loss of heterozygosity in the natural population should be another important consideration for the low detection value of genetic heterozygosity. In our previous studies, we found that the genetic diversity in a topotype population of rare minnow (between 1997 and 2006) was moderate [42], with similar results in nine wild populations in the upper Yangtze River [47]. The habitat of rare minnow is mainly small water systems such as paddy fields, ditches, and loblollies [26,27]. All known habitats of the rare minnow are a dozen to a hundred miles away from one another, isolated or separated by large rivers [42]. The narrow habitat environment may be the cause of moderate diversity. In our study, the founder population of IHB was primarily derived from a topotype population as mentioned above, and this may be the reason for the slightly flat heterozygosity in IHB.
As heterozygosity may be an excellent indicator of genetic variability evaluation in a closed colony, it might be challenging to reach an agreement about the criterion of heterozygosity among different methods or species. For instance, in genetic variability testing of laboratory animals, based on SSR markers, the threshold of heterozygosity is generally recognized to be between 0.5 and 0.7 [41]. However, the criterion of heterozygosity in SSR analysis may not apply to the SNP method. Considering the requirements of closed colonies, maintaining stable heterozygosity between generations and establishing a quality control chart may be good indicators for the SNP methods.
In our study, no significant differences in Ho and He were observed between the five closed colonies, indicating homogeneity in the genetic diversity among IHBs over time.

Stability Maintenance of the Genetic Structure in a Closed Colony
In order to understand the genetic structure of a closed colony more comprehensively, there is a need for more indicators other than heterozygosity, such as population genetic equilibrium, F-statistics (F ST ), and genetic distance. Population genetic equilibrium is an important indicator to evaluate the genetic structure, and is closely related to gene mutation, genetic drift, and natural selection of population [48]. In some studies [49] and national standards [2], the genetic equilibrium has been determined for the evaluation of the genetic variability of closed colonies. Ideally, the genetic structure of a well-managed closed colony may be equalized between generations. In this study, we found that all five closed colony generations conformed to the equilibrium based on the mean Hardy-Weinberg equilibrium test p-value. However, one locus in the F10 generation deviated from Hardy-Weinberg equilibrium, which may be related to the position of this SNP in the genome, or alternatively, the effect of domestication of this generation and representativeness of samples.
F-statistics (F ST ) was used to indicate differentiation among successive generations within the closed colony, with values ranging between 0-0.05, 0.05-0.15, and 0.15-0.25, indicating "little," "moderate," and "great" genetic differentiation, respectively, as stated by Wright et al. [50]. In our study, the F ST value of 0.0146 indicated that genetic differentiation among the five generations was negligible and that it may be associated with the breeding strategy of IHB. The foundation stock of IHB was bred by strictly following a maximum non-inbred method [2].
In a well-managed closed colony, there should be minor differences in the genetic distance between successive generations. However, because all mating events in a closed colony occur among colony members, some weak alleles associated with reproductive capacity might be lost gradually over several generations. Thus, there might be an accumulation of low genetic differentiation with continuous passage in a closed colony. In our study, although the key genetic parameters showed no significant differences within the five groups, we observed a gradual increase in genetic distance among these five closed colony generations (F1 to F10), between early and later generations. It is difficult to avoid the accumulation of a few genetic differentiations during passage in a closed colony. For a qualified closed colony, it is necessary to maintain a stable genetic distance between adjacent generations.
Meanwhile, reinforcing breeding management strategies for the foundation stock is also indispensable to the maintenance of the stability of the genetic background, by avoiding any selection pressure on the parent fish for as long as possible, maintaining an equal probability of transferring an allele in the whole population, and decreasing the probability of genetic drift among generations [51]. Meanwhile, according to the report of Jackson Laboratory, the embryo freeze method can be used to potentially decrease the passage number of closed colonies [52].
In summary, maintaining stable genetic diversity and genetic structure may be the primary consideration for a qualified closed colony. It is necessary to select appropriate evaluation indicators, such as stability of heterozygosity, genetic equilibrium, F ST , and genetic distance, and also consider the effect factors and sphere of application of each index, when conducting a genetic variability evaluation of a closed colony.

Conclusions
In the present study, we developed SNP markers for the rare minnow for the first time. We also conducted genetic diversity and structure analyses in successive generations of the IHB strain, establishing a genetic variability monitoring method of rare minnow closed colonies using SNPs. Our study revealed stable genetic structure in successive generations, demonstrating the success of the reproductive strategies used for IHB. The results of this study could help establish genetic variability control standards for the rare minnow, maintain resources for wild population restoration, and provide a reference for research on other laboratory fish.
Author Contributions: Conceptualization, J.W.; methodology, L.C. and M.H.; validation, C.X.; investigation, Z.X.; writing-original draft preparation, L.C. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the Hubei Key Technology R&D Program (grant number 2015BCE098).