Assessing Genetic Diversity for a Pre-Breeding Program in Piaractus mesopotamicus by SNPs and SSRs

The pacu (Piaractus mesopotamicus) is a Neotropical fish with remarkable productive performance for aquaculture. Knowledge of genetic resources in Neotropical fish is essential for their applications in breeding programs. The aim of this study was to characterize the genetic diversity of seven farmed populations of pacu which will constitute the basis for a broodstock foundation for coming breeding programs in Brazil. Analysis of one wild population (Paraná River) was used as a reference to compare genetic parameters in the farmed populations. The analyses were performed using 32 single-nucleotide polymorphisms (SNP) and 8 simple sequence repeat (SSR) markers. No significant differences in genetic diversity between populations estimated through the number of alleles and allelic richness, observed heterozygosity, expected heterozygosity, and minimum allele frequency were detected (p > 0.05). Low genetic diversity was observed in all farmed stocks and the wild population. Moreover, we detected low genetic structure when comparing farmed and wild populations for SNPs (FST = 0.07; K = 3) and SSRs (FST = 0.08; K = 2). Analysis of molecular variance (AMOVA) demonstrated that genetic variation was mostly within populations. Kinship analysis showed that most fish farms included related individuals at a proportion of at least 25%. Our results suggest that the basal broodstock for pacu breeding programs should be founded with individuals from different fish farms for higher genetic diversity and to avoid inbreeding risks.


Introduction
The pacu (Piaractus mesopotamicus) is a Characiform fish with a wide natural distribution throughout La Plata basin which covers an area over five South American countries: Brazil, Uruguay, Bolivia, Paraguay, and Argentina. Wild populations of pacu are threatened by overfishing since this species is considered to have high commercial value, with large-scale catches occurring by industrial and recreational fisheries [1]. According to the latest official statistics on industrial fisheries in Brazil, wild

SSR and SNP Analysis
DNA extraction was performed using the saline extraction protocol [19]. DNA integrity was evaluated on 1% agarose gel and its purity was assessed using a NanoDrop One spectrophotometer (Thermo Fisher, Madison, USA). The DNA concentration was quantified using the Qubit dsDNA BR Assay kit (Life Technologies, Oregon, USA) and measured in a Qubit 3.0 Fluorometer (Invitrogen, Kuala Lumpur, Malaysia).
SNP genotyping was carried out using 32 markers obtained from the liver transcriptome [18]. Analysis was performed using the MassARRAY platform (Sequenom, San Diego, CA, USA) in CeGen (Spanish Genotyping National Center, Santiago de Compostela node, Spain) as described in Mastrochirico-Filho et al. [18]. SNP data from the wild population (WILD) were previously reported by Mastrochirico-Filho et al. [18].

Statistical Analysis
The presence of null alleles (F null ) and allelic dropout in microsatellite loci were tested using MICRO-CHECKER 2.2.3 [21]. The number of alleles per locus (N a ), number of private alleles (N p ), and observed (H obs ) and expected (H exp ) heterozygosity were estimated using CERVUS 3.0 [22]. The minimum allele frequency (MAF), exact tests for deviation from the Hardy-Weinberg equilibrium (HWE) (Markov Chains of 100,000 steps), and linkage disequilibrium (LD) (p < 0.05) were performed using GENEPOP 4.0.11 [23]. The allelic richness (A r ) and inbreeding coefficient (F IS ) were estimated using FSTAT 2.9.3.2 software [24]. Bonferroni correction was applied for multiple tests [25].
The effective population size (N e ) was estimated by the linkage disequilibrium method in NeESTIMATOR V2.01 [26] and was used for a gross estimation of inbreeding in the analyzed fish farms (∆F = 1/2Ne) [27]. Recent bottleneck events were evaluated by M-ratio testing [28] using ARLEQUIN version 3.5.2.2 [29]. Bottleneck analysis was performed only for SSR data because of the biallelic nature of the SNPs.
To estimate genetic differentiation between the stocks, global and pairwise F ST values were calculated using FSTAT version 2.9.3.2 [24]. The significance of these values was estimated with 10,000 permutations.
Levels of admixture among stocks were inferred by estimating the optimum number of population clusters (K) [30] using STRUCTURE version 2.3.4 [31] without prior information about the population. Primarily, we determined the distribution of ∆K, an ad hoc statistic based on the rate of change in the log probability of data between successive K values. The range of clusters (K) was predefined from 1 to 8. The analysis was performed in 80 replicated runs (i.e., 10 replicates for each K value) using 500,000 iterations after a burn-in period of 100,000 runs. The most likely K value to explain the population structure was the modal value of this ∆K [30]. The outputs of STRUCTURE analysis were visualized through the STRUCTURE HARVESTER program [32]. The results of independent STRUCTURE runs were summarized and corrected for the best K using CLUMPP software version 1.1.2 [33].
The partitioning of variation at different hierarchical levels was calculated by analysis of molecular variance (AMOVA) in ARLEQUIN version 3.5.2.2 [29] using 10,000 permutations. Stocks were grouped according to the clusters obtained by STRUCTURE software.
Kinship coefficients (r xy ) [34] and potential for each locus to exclude a false parent were estimated using both SSR and SNP loci by COANCESTRY v. 1.0.1.8 [35]. Parentage exclusion probabilities of loci were determined when one individual taken at random from the population was excluded as a parent when no parent was known (PE2). This analysis was performed to evaluate with reliability the pairwise relatedness between farmed breeders from each fish farm without any parental information. Threshold values of the kinship coefficient were adopted as lower values (r xy < 0.125) corresponding to unrelated individuals; intermediate values 0.125 ≤ r xy ≤ 0.375 were considered half-siblings; and r xy > 0.375 were considered full siblings [36].

Results
The parameters of genetic variability for pacu populations determined by 8 SSR and 32 SNP markers are shown in Tables S1 and S2, respectively. The mean values of the population parameters and overall locus p values of HWE and F IS values are shown in Table 1.
For SSR loci, a total of 44 alleles were detected in the analyzed populations. The average number of alleles per locus was 3.84 ± 1.16, with numbers ranging between 2 and 7. Allelic richness ranged from 2.000 to 6.140, with average population values ranging between 3.331 ± 0.868 (FF1) and 3.975 ± 1.189 (FF7). Private alleles were detected in individuals belonging to FF2, FF4, FF7, and WILD.
Overall population F IS values varied similarly between SSR (−0.113 and 0.212) and SNP loci (−0.082 to 0.056). However, no significant deviations from zero of F IS values were found in the fish farms when Bonferroni correction was applied for SNPs (p = 0.001) and SSRs (p = 0.006). Additionally, F IS values were not considered significantly different from zero over all loci for both markers (p > 0.05), presenting low 95% confidence interval values for SSRs (−0.022 to 0.135) and SNPs (−0.075 to 0.011) when bootstrapping over loci analyses were performed.
Reduced effective population size was detected in fish farms when compared to WILD for both SNP and SSR sets (except for FF6 in SSR analysis). N e estimation showed values ranging from 2.3 in FF1 to 20.2 in WILD using SNP markers. In relation to SSRs, N e values ranged from 7.0 in FF4 to 60.8 in FF6 ( Table 2). The rate of inbreeding (∆F) was detected by considering N e estimates and resulted in the lowest values in FF6 for SSRs (0.01) and in WILD for SNPs (0.02) and the highest values in FF4 for SSRs (0.07) and in FF1 for SNPs (0.22). Evidence of recent reductions in population size was found in all fish farms (M-ratio < 0.68) by considering SSR analysis ( Table 2) characterizing bottleneck events. After the detection of reduced effective population sizes, kinship evaluation showed that most of the fish farms had related individuals (full sibling and half-sibling) in a proportion of at least 25%, as estimated from both marker sets (Table 3; Figure 1), with significant probability of mating between related breeders. The percentage of related individuals was notable in FF4 (61.0% and 51.4% for SNPs and SSRs, respectively) with a high proportion of full sibling individuals (25.0% and 36.0% for SNPs and SSRs, respectively). Although FF7 showed a considerable proportion of related individuals, this farm showed the highest rate of unrelated individuals (65.1% and 74.3% for SNPs and SSRs, respectively) ( Figure 1). Confidence intervals (95%) of pairwise relatedness for each dyad are presented in Table  S4. The PE2 for each locus for SSRs ranged from 0.04 (locus Pm3) to 0.31 (loci Pm2 and Pm7) with an exclusion probability of 0.81 for all loci. Considering SNPs, PE2 values ranged from 0.01 (loci 41_428, 437_455, 458_2209, and 83_761) to 0.12 (loci 1013_445, 213_629, 260_818, 391_875, 470_159, and 4_231) with an exclusion probability of 0.07 for all loci (Table S5).  (Table S5).  Global FST values suggested low genetic differentiation among populations when using SSRs (FST = 0.080, p < 0.006) or SNPs (FST = 0.067, p < 0.001). Pairwise FST values were also calculated for fish farms and significant differentiation was found between most pairs when considering SNP and SSR sets (p < 0.001 and p < 0.006, respectively) ( Table 4). Low to high genetic differentiation was observed among the population pairs. Pairwise FST values for SSR loci were mostly significant (p < 0.006) and ranged between −0.002 and 0.204. The highest significant FST values (p < 0.006) were observed between WILD and FF4 (FST = 0.204) and FF2 and FF4 (FST = 0.143). Conversely, the lowest genetic differentiation was detected between FF3 and FF6 (FST = −0.002, p > 0.05). Regarding SNPs, the highest genetic differentiation (p < 0.001) was between FF2 and FF4 (FST = 0.146). Conversely, WILD and FF7 (FST = 0.032) registered the lowest genetic differentiation (p < 0.001). Global F ST values suggested low genetic differentiation among populations when using SSRs (F ST = 0.080, p < 0.006) or SNPs (F ST = 0.067, p < 0.001). Pairwise F ST values were also calculated for fish farms and significant differentiation was found between most pairs when considering SNP and SSR sets (p < 0.001 and p < 0.006, respectively) ( Table 4). Low to high genetic differentiation was observed among the population pairs. Pairwise F ST values for SSR loci were mostly significant (p < 0.006) and ranged between −0.002 and 0.204. The highest significant F ST values (p < 0.006) were observed between WILD and FF4 (F ST = 0.204) and FF2 and FF4 (F ST = 0.143). Conversely, the lowest genetic differentiation was detected between FF3 and FF6 (F ST = −0.002, p > 0.05). Regarding SNPs, the highest genetic differentiation (p < 0.001) was between FF2 and FF4 (F ST = 0.146). Conversely, WILD and FF7 (F ST = 0.032) registered the lowest genetic differentiation (p < 0.001). To evaluate the level of admixture among samples, Bayesian model-based clustering analyses were performed based on the ∆K distribution by examining SSR and SNP loci. The selection of the estimated number of clusters in the dataset was based on the number of analyzed fish farms (K = 1 to 8). According to the analysis based on the Evanno method [30], the hypothesis of occurrence of K = 1 was discarded due to the higher −ln P(K) values found in all analyses (data not shown). The results showed that K values of 2 and 3 for SSRs and SNPs, respectively, were the most suitable to explain the population structure of pacu stocks ( Figure 2). For the SNP dataset in K = 3, moderate clustering was found between fish farms, with three putative clusters composed of (1) FF1 and FF4; (2) FF2, FF3, and FF6; and (3) FF7 and WILD. In addition, FF5 seems to be represented as an admixture between the presented clusters ( Figure 2a). However, considering the SSR analysis in K = 2, two genetic clusters were detected: (1) FF1, FF2, and WILD and (2) FF3, FF4, and FF5. The remaining fish farms (FF6 and FF7) were considered an admixture of both genetic groups (Figure 2b). Moreover, the analysis showed a structure for K = 5 in SSRs in which a clear cluster is composed of FF4. Hence, these analyses confirmed the estimated results of pairwise F ST analyses. To evaluate the level of admixture among samples, Bayesian model-based clustering analyses were performed based on the ΔK distribution by examining SSR and SNP loci. The selection of the estimated number of clusters in the dataset was based on the number of analyzed fish farms (K = 1 to 8). According to the analysis based on the Evanno method [30], the hypothesis of occurrence of K = 1 was discarded due to the higher −ln P(K) values found in all analyses (data not shown). The results showed that K values of 2 and 3 for SSRs and SNPs, respectively, were the most suitable to explain the population structure of pacu stocks (Figure 2). For the SNP dataset in K = 3, moderate clustering was found between fish farms, with three putative clusters composed of (1) FF1 and FF4; (2) FF2, FF3, and FF6; and (3) FF7 and WILD. In addition, FF5 seems to be represented as an admixture between the presented clusters ( Figure 2a). However, considering the SSR analysis in K = 2, two genetic clusters were detected: (1) FF1, FF2, and WILD and (2) FF3, FF4, and FF5. The remaining fish farms (FF6 and FF7) were considered an admixture of both genetic groups (Figure 2b). Moreover, the analysis showed a structure for K = 5 in SSRs in which a clear cluster is composed of FF4. Hence, these analyses confirmed the estimated results of pairwise FST analyses.  Despite the genetic structure among populations being confirmed by STRUCTURE analysis, AMOVA analysis based on groups supported by the results of STRUCTURE showed that the higher percentage of genetic variation was assigned to differentiation within populations: 92.4% by SNPs (p < 0.001) and 91.1% by SSRs (p < 0.0001). The estimated variation between groups (F CT ) was only 2.2% (p = 0.01) for SNPs and 3.9% (p = 0.02) for SSRs. In addition, the variation among populations and within groups (F SC ) presented 5.3% of the genetic variation (p < 0.001) for SNPs and 5.0% (p < 0.0001) for SSRs.

Discussion
Currently, the methodology for breeding programs is well established in model species of worldwide aquaculture [37]. On the other hand, there are few studies applying genetic markers for the development of genetic selection programs in Neotropical fish. Several practices associated with Neotropical fish production, especially those related to the management of broodstock, may reduce the effective population size [38]. These practices are generally linked to the lack of registration and control of the broodstock, such as information on its origin, kinship, and mating record, and maintenance of the same stock over several generations, which results in increased susceptibility to inbreeding depression that compromises the foundation of hatchery stocks when starting breeding programs [39,40]. Therefore, our results demonstrate the importance of this study, which can be considered the baseline to create the basal broodstock for initial breeding programs of pacu, one of the most important native species of South American aquaculture.
Considering SSR loci, the analysis of genetic diversity parameters estimated on pacu broodstocks showed low values for the number of alleles (ranging around 4) and heterozygosity (ranging around 0.500). It was expected that cultivated stocks would present low genetic diversity values, as herein observed; these low values characterize populations with genetic drift events due to low effective population sizes and, consequently, recent bottleneck effects/founder events, similar to other studies in a related Neotropical species, Piaractus brachypomus [41]. However, the low levels of genetic diversity in farmed stocks were also shared by the WILD reference population. These values show lower results when compared to previous studies of genetic diversity in pacu natural populations based on SSRs [17] that reported a higher mean number of alleles (8.5 alleles) and higher heterozygosity values (H exp~0 .600). Therefore, our hypothesis is that the WILD population may be threatened due to the high level of exploitation imposed by commercial and/or recreational fishing, since the region where the WILD population was collected is a famous fishing spot in Brazil [42]. Moreover, the Paraná River has been drastically impacted by dam barriers and pollution effects [17,42] which could negatively affect natural populations of pacu.
Although there is no comparative research of the genetic diversity of pacu populations using SNP markers, the values of heterozygosity (i.e., H obs and H exp ) and MAF indicated low genetic variability in the farmed populations studied. This hypothesis was based on similar values found in WILD and sampled fish farms, considering that wild populations of pacu have no expressive heterozygosity values when using microsatellite loci. In addition, the diversity values herein observed were similar to those in other fish studies involving SNP analyses [43][44][45].
In general, farmed stocks have the tendency to show reduced genetic variability over generations due to artificial selection and a reduced number of breeders in the initial base population [11,[46][47][48]. Therefore, it is important to evaluate how the genetic diversity can be maintained when considering the possibility of bottleneck events due to the low effective population size shown in fish farms. The significant bottlenecks detected for all stocks and their low N e values (except for WILD) must receive special attention, since our results will serve to find a base population of pacu for upcoming breeding programs.
Kinship analysis has been an essential tool in genetic pre-breeding programs of fish to reduce inbreeding rates by directing the mating of unrelated individuals [49][50][51][52]. Except for FF7, which showed a high proportion of unrelated individuals for both markers (65.1% by SNPs and 74.3% by SSRs), all fish farms showed a substantial number of related individuals (half sibling or full sibling); this outcome results in a higher inbreeding risk, which can affect morphological and viability traits [10]. Thus, molecular identification of individuals is necessary to effectively monitor the genetic variability of the stocks and to assess how this variation can be maintained through selective mating [13,27]; this is especially the case in FF4, which presented a high proportion of pairwise kinship, the lowest value of effective population size, and a higher rate of inbreeding.
Impacts related to insufficient individuals used in hatchery productions and their interference with the genetic diversity of cultivated populations have been studied for important species cultivated worldwide, such as Atlantic salmon [53]. Similar studies in Neotropical species are fundamental and indispensable for ensuring the correct functioning of initial breeding programs, mainly due to the traditional practices related to the management of Neotropical broodstocks and the lack of genetic information of this species with high potential for production.
In the present study, our initial hypothesis was based on the fact that pacu farmed stocks did not have gene flow because the broodstocks are geographically isolated and producers frequently do not perform exchange of breeders among the fish farms. Therefore, we would expect higher values of F ST estimates indicating genetic differentiation (driven by genetic drift and isolation), as detected in FF4, similar to in previous studies carried out in other related species such as P. brachypomus [41]. However, overall F ST values suggested significant genetic differentiation of 0.067 by SNPs and 0.080 by SSRs (both values with p < 0.05), indicating low differentiation between the farmed stocks. This may be explained using two hypotheses: (i) stock foundation based on sharing of breeders among fish farms, resulting in genetic similarities between broodstocks, and/or (ii) stock foundation in the fish farms based on the capture of wild breeders, which are characterized as belonging to a panmictic unit due to the lack of genetic structure in natural populations [15,17], particularly because pacu have high gene flow capacity due to their migratory behavior in the wild.
The STRUCTURE analysis revealed differences between SNPs and SSRs, showing three and two clusters for the pacu stocks, respectively. In this study, the dataset of SNPs originated from the pacu liver transcriptome and these SNPs were annotated mostly in genes related to productive traits, including SNPs classified as non-synonymous mutations [18]. Therefore, both markers might differ in their ability to detect population structure; SNPs are mainly gene-associated while SSRs are expected to be neutral markers, which results in different mutation rates. Neutral markers, such as SSRs, are widely used to perform genetic variation analysis mainly in natural populations, while gene-associated markers could be more useful to analyze the variability of organisms in response to artificial selection [54,55]. This can explain the differentiation of FF1/FF4 in relation to the other fish farms as revealed by SNPs, particularly due to the breeding management practices carried out in these fish farms, such as mass selection to obtain individuals with better growth performance. To design suitable breeding programs in terms of genetic diversity in farmed pacu stocks, we also assume that it is better to use the information of the clusters generated by the SNPs. However, as SNPs are generally biallellic and with lower polymorphism compared to SSRs, additional SNPs markers need to be included in further analysis to achieve better conclusions about the genetic structure of pacu farmed stocks.
The results of this study are aimed to provide initial knowledge about the genetic profile of pacu stocks in different fish farms, considering the importance of pacu to South America aquaculture and the necessity to offer subsidies for the development of its production. The results provide information relevant to one of the most important cultivated Neotropical species. The genetic variability and differentiation of stocks and fish farm profiles considering the risks of inbreeding and the necessity of directed matings of the stocks should be known in order to take appropriate actions for the creation of the base population. In conclusion, the SNP and SSR sets showed their applicability in a pre-breeding program, particularly in delineating the formation of the best families in terms of genetic variability and genetic structure.

Conflicts of Interest:
The authors declare no conflict of interest.