Rapid Development of Microsatellite Markers with 454 Pyrosequencing in a Vulnerable Fish, the Mottled Skate, Raja pulchra

The mottled skate, Raja pulchra, is an economically valuable fish. However, due to a severe population decline, it is listed as a vulnerable species by the International Union for Conservation of Nature. To analyze its genetic structure and diversity, microsatellite markers were developed using 454 pyrosequencing. A total of 17,033 reads containing dinucleotide microsatellite repeat units (mean, 487 base pairs) were identified from 453,549 reads. Among 32 loci containing more than nine repeat units, 20 primer sets (62%) produced strong PCR products, of which 14 were polymorphic. In an analysis of 60 individuals from two R. pulchra populations, the number of alleles per locus ranged from 1–10, and the mean allelic richness was 4.7. No linkage disequilibrium was found between any pair of loci, indicating that the markers were independent. The Hardy–Weinberg equilibrium test showed significant deviation in two of the 28 single-loci after sequential Bonferroni’s correction. Using 11 primer sets, cross-species amplification was demonstrated in nine related species from four families within two classes. Among the 11 loci amplified from three other Rajidae family species; three loci were polymorphic. A monomorphic locus was amplified in all three Rajidae family species and the Dasyatidae family. Two Rajidae polymorphic loci amplified monomorphic target DNAs in four species belonging to the Carcharhiniformes class, and another was polymorphic in two Carcharhiniformes species.

In this work, we carried out a global analysis of the repetitive fraction of the R. pulchra genome using NGS combined with the bioinformatics approach described above. This work will be useful for future resource management of this economically and ecologically important, but vulnerable, species.

Pyrosequencing, Repeat Identification and Classification
NGS produced 453,549 reads with an average length of 487 bp; 4,305 contiguous sequences (contigs) were constructed by Newbler (version 2.6; Roche) from these reads. The total number of bases was 2,024,317, and the largest contig was 16,906 bp. In addition, we found 307,931 unassembled singleton sequences for a total read of 150,038,230 bp. The total number of reads after trimming of low quality sequence was 288,854, and the sum of the total reads was 148,321,261 bp in length, which was used in a search for SSRs. The number of reads with dinucleotide repeat were: 4164 AT, 6544 CA, 6239 CT, and 86 GC repeats. The sequence and number of 2625 trinucleotide repeats are shown in Table 1.  8  TAC  8  AAT  273  TTG  124  AAG  164  TTC  102  AAC  62  TGA  65  ATA  238  TGG  124  ATG  53  TGC  104  ATC  89  TCG  3  AGA  85  TCC  134  AGT  10  GAG  117  AGG  98  GAC  6  AGC  82  GTG  54  ACA  56  GGG  10  ACG  2  GGC  32  ACC  90  GCG  26  TAA  261  CAG  103  TAG  14  CGG  28

Selection and Characterization of the SSR
Thirty two loci containing dinucleotide microsatellites with over nine repeats were selected for further evaluation. Primers were designed for these loci and tested in four individuals. Twenty primer sets (62%) produced strongly amplified PCR products, and 14 showed clear amplification with polymorphic patterns. However, no clear genotype pattern was obtained for five loci, and Rp19-nfrdi was monomorphic in all of the tested individuals. The primer sequences, repeat motifs, annealing temperatures, PCR ranges, number of alleles, and GenBank accession numbers for the 15 new microsatellite loci are summarized in Table 2. The number of alleles per locus varied from one (at Rp19-nfrdi) to ten (at Rp11-nfrdi) in a total of 60 tested individuals.   Table 3 summarizes the genetic characterization indices estimated for the two skate ray populations. Allelic richness per locus ranged from 2-9.8 in the two populations. The mean allelic richness was 4.7. The average number of alleles among all populations was 4.8. In total, there were 13 unique alleles, five from the Daecheong-do population and eight from the Heuksan-do population. No linkage disequilibrium was found between any pair of loci (p > 0.05), indicating that the markers were independent. The Hardy-Weinberg equilibrium (HWE) test, indicating the deviation from expected heterozygosity, showed significant deviation in two (Rp3-nfrdi and Rp43-nfrdi) of the 28 single-loci in the Heuksando population after sequential Bonferroni's correction. Null alleles were presumed in four (Rp3, Rp11, Rp18 and Rp43-nfrdi) of 14 loci. No genetic differentiation between DC and HS populations was detected by F ST (=0.013) using all 14 microsatellite markers.

Cross-Species Amplification
Cross-species amplification of 11 loci was conducted in nine related species belonging to four families within two classes, as shown in Table 4. These are representative species of cartilaginous fish. All of the 11 loci amplified the target DNAs from the three other fish species, Okamejei boesemani, O. kenojei and O. acutispina that belong to the same family, Rajidae. Among these loci, Rp11, Rp16 and Rp53 were polymorphic in all three species. The locus Rp16, which was monomorphic in R. pulchra, was also monomorphic in all three species. Additionally, the monomorphic locus Rp16 was amplified in two fish species, Dasyatis akajei and Uroplophus aurantiacus, belonging to the Dasyatidae family. In contrast, no amplification of other loci was observed in either or both of these species. The Rp3 and Rp16 loci that were polymorphic in the Rajidae family amplified monomorphic target DNAs in four fish species belonging to the class Carcharhiniformes, and the Rp35 locus was polymorphic in two species belonging to this class (Table 4).

Discussion
Among 11 skate species identified in Korea so far, R. pulchra is the most economically important species, and it is consumed raw or as fermented fish. It inhabits coastal areas around the Korean Peninsula and was extremely common about 20 years ago in the Yellow Sea, especially around the Daecheong-do and Heuksan-do Islands of Korea. As in other skates, reproduction of R. pulchra is oviparous, and the number of eggs in each spawning is less than 250 [4]. Because there has been a >90% catch decrease over the past ten years, it has been placed on the "vulnerable" species list by the IUCN [8]. Therefore, it is necessary to understand the genetic structure and population diversity for future management and resource restoration.
MS markers have many advantages over other molecular markers for the study of population structure and diversity [25]. However, the cloning and sequencing processes commonly used in MS marker development are time consuming and costly, often with positive clone yields as low as 0.03% [26].
NGS has recently been applied in the development of MS markers in many organisms, greatly reducing the cost, labor and time required [13,15,16,27]. This method has been shown to be more effective when enriched DNA libraries are used for sequencing [28]. Despite the advantages, the development of MS markers using NGS in aquatic organisms has been limited, but has been proven to be effective in recent publications [21][22][23][24]29].
In addition to speed and cost-effectiveness, NGS offers high flexibility in primer design because a large number of reads and sequences containing SSRs and loci with the proper repeat units can be selected. In our study, 17,033 (5.5%) from a total of 312,290 contigs contained dinucleotide SSRs, and 33 loci containing a minimum of nine repeat units were used for primer design, which is only 0.2% of the identified SSRs. In the analysis of MS markers reported in plant, only 1% of polymorphic loci identified using NGS were selected by stringent criteria for primer design, compared with 20% of SSRs discovered using a traditional cloning and sequencing process were selected for primer design [30].
To select optimal primers from a large number of candidates, high-quality standards can be applied in loci selection, which can increase the success rate and reduce the time, labor and cost of marker development. For instance, to minimize the risk of null allele amplification, 454 sequence alignments can be used to design primers using loci with a reasonably high depth of coverage at both primer-annealing sites, and that show no evidence of SNPs within the regions [31]. In our study, 20 (62%) primer sets among 32 loci containing a minimum of nine repeat units produced amplified PCR products, and 14 (70%) of them were polymorphic.
After sequential Bonferroni correction for multiple tests, significant deviations from HWE in the direction of heterozygote deficiency were detected at only two (Rp3-nfrdi and Rp43-nfrdi) of the 14 tested loci in the Heuksan-do population. Generally, heterozygote deficiencies increase due to factors such as inbreeding, substructuring of the population sample, or the presence of null alleles. Indeed, our Microchecker analysis revealed the presence of null alleles at four loci, including Rp3-nfrdi and Rp43-nfrdi. The level of genetic diversity (mean Ho = 0.57, He = 0.61; mean allelic number = 4.7) in this study was slightly higher than that of the common skate, Dipturus batis (mean Ho = 0.35, He = 0.36; mean allelic number = 3.5) [32]. No genetic differentiation between populations by F ST may suggest that the two populations can be regarded as one population. However, differentiation of genetically related skate population by distance has been observed in northeast Atlantic continental shelf [32]. To reduce the time and cost of MS marker development, pre-existing markers from related species have been applied to the study of many animal, fungus, plant and fish species [33,34]. Moreover, cross-species markers can be used for the identification of invading species and the identification of parental origin in hybrid fish [35]. Although the cross-species transferability of MS markers is unevenly distributed among taxa, over 40% and 25% of polymorphic marker transfers have been observed in fish of different genera within the same family, and of families within same order, respectively [33].
With large numbers of SSR candidates identified by pyrosequencing, the development of cross-species markers has been attempted in diverse organisms [36]. The markers developed in this study were tested for cross-species amplification and possibility of finding universal markers for cartilaginous fish with available fish samples. Among the ten markers polymorphic in R. pulchra, four (40%) were polymorphic in other species within same family, which is comparable to previous data [33]. However, none of them showed any polymorphic amplification of targets in specimens belonging to different families within the same species. One interesting finding was that loci Rp16 and Rp35, which do not amplify the target sequences in the family Dasyatididae, produced monomorphic or polymorphic PCR products in Carcharhiniformes. Although the numbers of genera and samples for each genus are limited, these results suggest the possibility that cross-species polymorphic markers can be developed in fish species using the pyrosequencing technique with large number of SSR candidates for marker development.
The rapid population decline of R. pulchra around the Korean Peninsula, especially in the Yellow Sea, is due to many factors such as overfishing and destruction of their natural habitat. However, there have been no conservation measures in place for this species. Therefore, the molecular markers developed in this study will be useful for future assessment and monitoring of population trends of this species. In addition, this study suggests that pyrosequencing methods will be useful for SSR development in aquatic organisms, and may be preferable to the traditional labor-and time-intensive cloning methods.

Sample
A total of 60 wild R. pulchra samples were collected around the Korean islands: 30 from the Daecheong-do population and 30 from the Heuksan-do population. Muscle tissue samples were preserved in 100% ethanol at the sampling site and then transported to the laboratory for DNA extraction. Total DNA was isolated from each sample using a MagExtractor MFX-6100 automated DNA extraction system (Toyobo, Osaka, Japan). The extracted genomic DNA was quantified using a Nanodrop ND-1000 spectrophotometer (Thermo Fisher Scientific, Barrington, IL, USA) and stored at −20 °C until genomic DNA pyrosequencing. For the cross-species transferability test, DNA was extracted by the same method from ethanol-fixed tissues of nine related species belonging to four families within two classes that had been stored at the National Fisheries Research and Development Institute, Busan.

DNA Sequencing
DNA sequencing was performed with 454 pyrosequencing on a Genome Sequencer FLX-454 System (GS FLX sequencer). Sample preparation and DNA sequencing was performed according to the manufacturer's instructions (Roche Diagnostics, Mannheim, Germany). DNA sample was prepared from one individual and sequencing was conducted using a half of a 454 plate at the National Instrumentation Center for Environmental Management (NICEM) of Seoul National University.

De Novo Assembly and SSR Findings
The raw reads from the GS FLX 454 were assembled using the Newbler 2.6 software and the assembled contigs consensus and unassembled singleton sequences were merged to discover SSRs on the genome sequence. The sequence was filtered with high quality score using the Less Useful Chunks Yank (LUCY) 1.20p. A modified "SSR_finder.pl" perl program was then used to detect di-and tri-SSR markers. A pair of primers flanking each SSR was designed using Primer3 software [37]. The primer redundancy was tested by using the BLAST available at NCBI [38] on the basis of ≤0.001 e-values.

PCR and Genotyping
Newly-designed PCR primer pairs were tested to optimize annealing temperatures; a gradient PCR with a 50-60 °C annealing temperature range was performed on a set of samples from eight individuals. PCR amplification was performed in a 10 µL reaction mixture containing 0.25 U Ex taq DNA polymerase (TaKaRa Biomedical Inc., Shiga, Japan), 1× PCR buffer, 0.2 mM dNTP mix, 10 pmol each primer (the forward primer of each pair was 5′-end-labeled with 6-FAM, NED, and HEX dyes; PE Applied Biosystems), and 100 ng template DNA, using a PTC 200 DNA Engine (MJ Research, Waltham, MA, USA). PCR conditions were as follows: 11 min at 95 °C, followed by 35 cycles of 1 min at 94 °C, 1 min at the annealing temperature listed in Table 2, and 1 min at 72 °C, with a final extension of 5 min at 72 °C. Microsatellite polymorphisms were screened using an ABI PRISM 3130 XL automated DNA sequencer (Applied Biosystems), and alleles were designated according to PCR product size, relative to a molecular size marker (GENESCAN 400 HD [ROX]; PE Applied Biosystems).

Statistical Analysis
The number of alleles per locus, allele frequency, and heterozygosity were calculated using Arlequin version 3.0 [39]. Tests for population-wide linkage disequilibrium between pairs of loci and deviations from HWE were estimated using GENEPOP version 4.0 [40], and the adjusted p-values for both analyses were obtained using a sequential Bonferroni test for multiple comparisons. MICRO-CHECKER version 2.2.3 [41] was used to test the presence of null alleles. Allelic richness (A R ) as a standardized measure of the number of alleles per locus, independent of the sample size, was calculated using FSTAT version 2.9.3 [42]. A possible geographical pattern in the distribution of genetic variability was analyzed through F ST estimates and genetic distances between each pair of populations.

Conclusions
The pyrosequencing method was applied in the development of MS markers for R. pulchra, which is listed as a "vulnerable" species by the IUCN. A comparably high primer-to-marker conversion ratio (62%) was achieved by this method. In the cross-species amplification, over 90% of the markers were amplified in the fishes belonging to the same family but the success ratio was much less in the different family. The molecular markers developed in this study can be used for future management of this economically and ecologically important species.