Microsatellite Development for an Endangered Bream Megalobrama pellegrini (Teleostei, Cyprinidae) Using 454 Sequencing

Megalobrama pellegrini is an endemic fish species found in the upper Yangtze River basin in China. This species has become endangered due to the construction of the Three Gorges Dam and overfishing. However, the available genetic data for this species is limited. Here, we developed 26 polymorphic microsatellite markers from the M. pellegrini genome using next-generation sequencing techniques. A total of 257,497 raw reads were obtained from a quarter-plate run on 454 GS-FLX titanium platforms and 49,811 unique sequences were generated with an average length of 404 bp; 24,522 (49.2%) sequences contained microsatellite repeats. Of the 53 loci screened, 33 were amplified successfully and 26 were polymorphic. The genetic diversity in M. pellegrini was moderate, with an average of 3.08 alleles per locus, and the mean observed and expected heterozygosity were 0.47 and 0.51, respectively. In addition, we tested cross-species amplification for all 33 loci in four additional breams: M. amblycephala, M. skolkovii, M. terminalis, and Sinibrama wui. The cross-species amplification showed a significant high level of transferability (79%–97%), which might be due to their dramatically close genetic relationships. The polymorphic microsatellites developed in the current study will not only contribute to further conservation genetic studies and parentage analyses of this endangered species, but also facilitate future work on the other closely related species.


Introduction
Megalobrama pellegrini is a cyprinid fish species belonging to Cultrinae (Cypriniformes), which is endemic to China. It is distributed in the main streams and tributaries along the upper reaches of the Yangtze River [1,2] in the Sichuan basin. Recently, this species has become endangered as a consequence of a sharp decrease in the population size due to overfishing and/or the loss of habitats following completion of the Three Gorges Dam and several other dams along the upper Yangtze River [3][4][5][6], as M. pellegrini must inhabit flowing waters, especially in the breeding season [5]. Field surveys revealed that only one wild population of M. pellegrini had been found in Longxi River, one of the tributaries of the upper Yangtze River [5]. However, the studies on M. pellegrini are limited, and most studies have been restricted to artificial propagation, population ecology, and molecular endocrinology [5][6][7][8][9][10]. Regarding genetic studies, only Liu and Wang [11] studied the genetic structure of this species based on allozyme markers and indicated that M. pellegrini has a relatively high level of genetic variation, especially compared with the congeneric species Wuchang bream, M. amblycephala. However, no other molecular markers are available for the study of population and conservation genetics in this species. Furthermore, the validity of species in Megalobrama has long been a controversial subject because of the extraordinary similarities in morphological traits of the members of this genus [12,13]. M. pellegrin had been treated as being synonymous with M. terminalis, but recently several studies have recognized this to be a valid species, and based on morphological data, have clarified that the Megalobrama genus includes four valid species: M. amblycephala, M. pellegrini, M. skolkoii, and M. terminalis [1,12]. In short, the development of molecular markers to reveal population and conservation genetic studies, parentage analyses or a genetic map for this endangered species is required.
Microsatellites, or simple sequence repeats (SSRs), have been used widely since the late eighties for applications such as parentage analyses, population genetic structure and conservation genetics because of their high level of polymorphism, relatively small size and rapid analysis protocol [14][15][16][17]. However, the development of microsatellite loci using traditional methods is not only costly and time-consuming [18] but also limited by the difficulties of de novo development in species without any genomic information. Recently, the emergence of next-generation sequencing technologies has rapidly improved SSR development. Among next-generation sequencing technologies, the 454 GS-FLX technology (Roche Applied Science) has created new opportunities and made high-throughput microsatellite development cheaper and faster [19][20][21][22][23][24][25][26][27][28]. The first reports of this promising application were published in 2009 [19][20][21]. In addition, compared with single-nucleotide polymorphisms (SNPs), which have been increasingly used since the late nineties, SSRs offer high allelic diversity and the relative ease of transfer between closely related species [29]. Hence, SSRs could remain relevant genetic markers, at least for some specific applications.
Here, we present the development of 26 polymorphic species-specific microsatellite loci for the endangered species M. pellegrini based on the 454 sequencing technology. Subsequently, we tested all 33 microsatellite loci (including 26 poly-and 7 monomorphic microsatellite loci in M. pellegrini) in four other related species: M. amblycephala, M. skolkoii, and M. terminalis from the same genus, and one species, Sinibrama wui, from a closely related genus, none of which had previously published microsatellite primers, except for M. amblycephala [30]. The microsatellite markers described herein offer important genetic resources for the assessment, understanding and conservation of the endangered species M. pellegrini, and facilitate future work on the other related species.

454 Sequencing Results
The raw sequence data from the 1/4 run of 454 sequencing were 90.2 Mbp containing 257,497 reads/sequences with an average length of 367 bp (maximum: 644 bp, minimum: 21 bp). The raw sequences represented large numbers of individual sequence reads, which could be assembled into contigs. A total of 208,525 reads were assembled into 839 contigs with an average length of 847 bp (maximum: 9,191 bp, minimum: 500 bp), leaving 48,972 singletons. The mean length of these 49,811 sequences (839 contigs plus 48,972 singletons) was 404 bp, which was slightly longer than that of the raw sequences. Of the 49,811 unique sequences, 24,522 (49.2%) sequences contained SSRs, and 14,987 (30.1%) sequences could be used for SSR primer design.

454 Sequencing Results
Longer reads may increase the likelihood of detecting microsatellite loci with more repeats, which are expected to be more polymorphic increase the sequences length and eliminate the repetitive sequence of the 257,497 raw sequences were sequences to a large extent. The average length that of the raw sequences (404 bp  reads may increase the likelihood of detecting microsatellite loci with more repeats, which are expected to be more polymorphic [27,31]. We assembled the raw sequences/reads to contigs to increase the sequences length and eliminate the repetitive sequences. Approximately raw sequences were assembled into 839 contigs, which eliminated the repetitive he average length of the contigs and singletons was slightly longer than bp vs. 367 bp), which was long enough for the identification of SSRs The result of tests for polymorphism and amplification across the five related three microsatellite loci were screened for polymorphism. 'P' polymorphic, specific amplification. Species , M. amblycephala reads may increase the likelihood of detecting microsatellite loci with more repeats, which assembled the raw sequences/reads to contigs to Approximately 81.0% (208,525) eliminated the repetitive was slightly longer than 367 bp), which was long enough for the identification of SSRs F M P and primer design [29]. Furthermore, the vast numbers of remaining sequences that did not contain microsatellite repeats would be useful for other purposes, such as molecular markers development for phylogenomic or ecological genomic studies.

The Microsatellite Development from M. pellegrini
The development of polymorphic microsatellite using 454 sequencing technology has been an outstanding new and increasingly universal method, which can develop more polymorphic markers with lower costs and less time-consuming than traditional methods [29]. In the present study, we have successfully developed 26 polymorphic microsatellite loci for an endangered fish species. There were 7 loci that deviated from HWE significantly, which was most likely due to the presence of null alleles, as occurred in four of those markers. Other possible causes for the observed deviation could be homozygote and/or heterozygote excess (e.g., MP6, MP9, MP11, MP34, and MP46). Eight of the 26 polymorphic loci were assessed to contain moderately high (PIC > 0.50) polymorphism degree (Table 1), which could be used in further resolution of the population structure and other conservation genetic studies [32]. In addition, the development of reliable microsatellite markers with moderately high PIC for M. pellegrini is the first step in introducing marker-assisted selection for directed breeding programs, which would be useful for parentage assignment in the selected M. pellegrini.
The genetic diversity of M. pellegrini based on the 26 polymorphic microsatellite loci in this study was moderate (mean N A = 3.08, H O = 0.47, H E = 0.51). Liu and Wang [11] also evaluated the genetic diversity of M. pellegrini using allozymes, and revealed that the mean heterozygosity was only 0.091, which was much lower than that from our study. This disparity was also observed in many other studies [33][34][35][36], which was likely caused by the different resolving power of different markers in the assessment of genetic diversity. The genetic diversity of M. pellegrini was similar to that of M. amblycephala (mean N A = 2.9, H O = 0.60, H E = 0.55) [30]. In addition, the microsatellite diversity in M. pellegrini was slightly lower than those from other endangered fishes endemic to the upper Yangtze River basin, such as largemouth bronze gudgeon (Coreius guichenoti, mean N A = 5.2, H O = 0.42, H E = 0.63) [37], Chinese rare minnow (Gobiocypris rarus, mean N A = 4.4, H O = 0.51, H E = 0.65) [38], and rock carp (Procypris rabaudi, mean N A = 6.9, H O = 0.71, H E = 0.77) [39].
M. pellegrini inhabits flowing waters and has strict habitat requirements, especially in the breeding season [5]. However, the Three Gorges Dam, constructed from 1994 and completed in 2006, has critically damaged the habitat by drastic changes in the environments; for example, the flow regimes were altered from free-flowing to stagnant, resulting in a substantial decline of biodiversity [3]. Furthermore, the increased human activity, particularly overfishing, is also a crucial factor for the population decline of M. pellegrini [40]. All of these factors have contributed to the sharp population decline of M. pellegrini in recent years [4,6]. Accordingly, the observed moderate genetic diversity in M. pellegrini (mean N A = 3.08, H O = 0.47, H E = 0.51) might be caused by the severe decline of the population size in recent decades, which would be more severe if the wild population of this species continuously decreased. Hence, there is an urgent need to create effective management strategies for the conservation of wild populations of M. pellegrini. Artificial propagation and supplementation might be efficient approaches for the population recovery and preservation of the genetic diversity of this endemic species [3,40].

High Level of Cross-species Amplification
The amplification success and cross-species transferability of the markers in this study were high (Table 1 and Figure 1), which might be due to the close genetic relationships among the tested species. The validity of the species in Megalobrama had been controversial for a long time; however, several studies [1,12,13] have recently clarified that there are four valid species in Megalobrama: M. amblycephala, M. pellegrini, M. skolkoii, and M. terminalis. Cai et al. [13] compared the differences of the morphological traits among the four species in Megalobrama and revealed that some of their traits were partly similar. Furthermore, the pairwise genetic distances among the four species are unbelievably low based on the complete mitochondrial genome dataset (Wang et al. unpublished data). Both comparisons from the morphological and molecular datasets suggest that the fishes in Megalobrama have close genetic relationships and might have diverged from their common ancestor relatively recently. Furthermore, the determination of the high cross-species transferability and levels of polymorphism of microsatellite loci from all breams herein will be worthwhile for population genetics studies without the need for expensive and time-consuming de novo microsatellite development.

Sample Collection and 454 Sequencing
M. pellegrini samples were collected from the Longxi River, a tributary of the upper Yangtze River in Sichuan Province. Species identification was verified upon examination of morphological distinctions [1]. Fresh muscle tissues stored in 95% ethanol were used for the extraction of gDNA using the DNeasy Blood and Tissue kit (Qiagen). The quality and quantity of the gDNA were determined using an AstraGene Life Sciences Spectrophotometer (Astranet Systems Ltd, Newton, Cambridge, UK). The gDNA from a single sample was subjected to sequencing on a 1/4 plate in a 454 Life Sciences Genome Sequencer FLX Titanium instrument (Roche).

Microsatellite Discovery and Primer Screening
The resulting raw sequences for M. pellegrini were assembled into contigs using the Newbler 2.3 software. MSATCOMMANDER version 0.8.2 [41] was used to screen microsatellites with the default parameters (the minimum repeats were 10, 6, 4, 4, 4, and 4 for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide, respectively). A PERL script was performed to select sequences longer than 400 bp from among the contigs and singletons, which were searched for SSRs and further primer screening. Finally, the microsatellite loci used for the polymorphism screening were selected with additional constraints (minimum repeats of 14, 10, and 8 for the perfect di-, tri-, and tetra-nucleotide microsatellites, respectively) to increase the probability of polymorphism. The primers were designed using the Primer Premier software version 5 (www.PremierBiosoft.com), with the following criteria to identify loci with a good likelihood of reliable amplification: (i) GC content 40-60%; (ii) product size 150-350 bp; (iii) primer length 18-25 bp; (iv) melting temperature 50-60 °C with a maximum 2 °C difference between paired primers; and (v) maximum poly-N at the three prime end <3.

Marker Testing and Cross-species Amplification
All of the selected primer pairs were initially tested for polymorphisms in 8 individuals from the population of Longxi River. The total PCR reaction volume was 12.5 µL, containing 1.25 µL of 10X buffer, 25 mM MgCl 2 , 2.5 mM of each dNTP, 0.5 U Taq DNA polymerase (rTaq, TaKaRa), 3.0 pmoles each of forward and reverse primer, and ~20 ng of DNA template. The thermocycler settings for the polymerase chain reaction (PCR) were programmed as follows: 94 °C (5 min), followed by 36 cycles at 94 °C (30 s)/T m (45 s)/72 °C (40 s), and a final extension at 72 °C for 7 min; the T m was optimized according to different pairs of primer. The PCR products were determined based on the presence of a visible band upon running 7 µL of PCR product on a 6% denaturing polyacrylamide gel (PAGE gel). The pBR322 DNA/MspI molecular weight marker (TIANGEN) was used as a standard for the assessment of product size. We excluded those loci that could not be amplified or yielded double/faint bands, even after attempts to adjust the PCR conditions. Furthermore, the PCR reaction was performed twice per polymorphic locus to confirm their reproducibility.
Finally, all the successfully amplifiable microsatellite loci (including poly-and monomorphic loci) in M. pellegrini were assessed for cross-amplification in four other related species (M. amblycephala, M. skolkoii, M. terminalis, and S. wui), each against five individuals.

Statistical Analysis
The screened polymorphic loci were tested for genetic diversity at the population level, using 30 individuals collected from the Longxi River. We used POPGENE version 1.31 [42] software to determine the following summary statistics: number of alleles (N A ), observed and expected heterozygosity (H O and H E ). Exact tests were implemented in Arlequin 3.5 [43] for the assessment of deviations from Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) between the loci. CERVUS version 3.0.3 [44] was used to determine the polymorphic information content (PIC) for each locus, and the presence of null alleles was assessed at a 95% confidence interval using MICRO-CHECKER version 2.2.3 [45].

Conclusions
In this study, we developed 26 polymorphic microsatellite markers from the endangered M. pellegrini fish using the 454 sequencing technology. The genetic diversity of M. pellegrini in a wild population was moderate and could become even lower, given the continuously decreasing size of its wild population. Furthermore, the cross-species transferability of the novel markers in this study was clearly high, which might be due to the close genetic relationships among the tested species. In summary, the novel polymorphic microsatellites isolated herein will not only facilitate further parentage analyses, conservation genetic studies, and effective management of M. pellegrini but also be useful for exploring the genetic diversity and genetic structure of other breams in Cultrinae.