- freely available
Int. J. Mol. Sci. 2012, 13(11), 14946-14955; doi:10.3390/ijms131114946
Abstract: Tarim schizothoracin (Schizothorax biddulphi) is an endemic fish species native to the Tarim River system of Xinjiang and has been classified as an extremely endangered freshwater fish species in China. Here, we used a next generation sequencing platform (ion torrent PGM™) to obtain a large number of microsatellites for S. biddulphi, for the first time. A total of 40577 contigs were assembled, which contained 1379 SSRs. In these SSRs, the number of dinucleotide repeats were the most frequent (77.08%) and AC repeats were the most frequently occurring microsatellite, followed by AG, AAT and AT. Fifty loci were randomly selected for primer development; of these, 38 loci were successfully amplified and 29 loci were polymorphic across panels of 30 individuals. The Ho ranged from 0.15 to 0.83, and He ranged from 0.15 to 0.85, with 3.5 alleles per locus on average. Cross-species utility indicated that 20 of these markers were successfully amplified in a related, also an endangered fish species, S. irregularis. This study suggests that PGM™ sequencing is a rapid and cost-effective tool for developing microsatellite markers for non-model species and the developed microsatellite markers in this study would be useful in Schizothorax genetic analysis.
Tarim schizothoracin (Schizothorax biddulphi Günther) is an endemic fish species to Xinjiang Autonomous Region, China. It is a cold-water fish species and only distributed in the Tarim River system, which is situated in the arid area in the inland of the Central Asia and possesses unique natural conditions. It had been the main economic fish species in the Tarim River system in the 1960s and once accounted for 80% of total fish catches in the Bostan Lake . However, the population of S. biddulphi has declined dramatically since the 1970s because of overfishing, the threat from exotic fishes and numerous water diversions and constructed dams, which prevent the migration of spawning fish . Zhang et al.  reported that this species was represented by scattered individuals in some rivers and its distribution region became narrow with the number declined greatly as compared to the reported data in 1991. It was rated as Endangered in the 1998 IUCN Red List of Threatened Animals of China in 1998  and considered as the Class II protected species in Xinjiang Autonomous Region in 2004. To protect the genetic resources and develop the breeding stock of this species, studies on its genetic differentiation and population structure are necessary. However, very little genetic resources are currently available for this species.
Microsatellites have emerged as one of the most popular genetic markers for a wide range of applications in population genetics, conservation biology and evolutionary biology. Their codominant nature, high levels of polymorphism, reproducibility and greater information content compared with dominant marker data makes them particularly suitable for the estimation of population structure and genetic diversity [4,5]. However, the major drawback of microsatellite markers in the past has been the high cost of developing species-specific markers . Now, this has been alleviated with the advent of next-generation sequencing, which allows the detection and characterization of SSR loci easily achievable with simple bioinformatics approaches . The random sequencing-based approach to identify microsatellites was rapid, cost-effective and can identify thousands of useful microsatellite loci in a previously unstudied species [6–8]. At present, affordable and fast benchtop high-throughput sequencing instruments like the Ion Torrent Personal Genome Machine™ (PGM™) might enable reference laboratories to switch to genomic typing on a routine basis, which can reduce workload and rapidly provide information for further research.
In the present study, we used the high-throughput sequencing technology PGM™ to obtain a large number of genetic resources for S. biddulphi and polymorphic microsatellite loci were subsequently developed. Additionally, the cross utility of these markers was tested in a related, also endangered fish species, S. irregularis.
2. Results and Discussion
2.1. Sequencing by Ion Torrent PGM™
By PGM™ sequencing with a 318 chip, a total of 892.72 Mb data and 3,476,226 quality reads were obtained in a single sequencing run from the genomic DNA of one S. biddulphi individual. The length of the reads was quite concentrated in the range of 250 bp to 330 bp, with average of 257 bp. All reads were assembled into 40,577 contigs with mean length of 395 bp (Table 1). A total of 1379 microsatellites were identified in these contigs. SSR’s were found in 3.4% of these contigs and one microsatellite was found every 11.64 kb of genomic DNA (Table 2). Primers were designed for 1016 microsatellites using BatchPrimer 3 (Data S1). Compared to the weeks or even months that can be spent obtaining only tens of microsatellite loci by traditional approaches, the thousands identified here required only one or two days to take the sample from tissue through DNA extraction, library creation and titration and sequencing on PGM™ platform (Table 3). Additionally, the total costs for sequencing was only about $950. The result confirmed that the Ion Torrent PGM™ platform was currently one of the shortest run time and fastest speed  and lowest cost next generation sequencers capable of multi-million read level outputs [9,10].
2.2. Characteristics of Microsatellites
Among the microsatellites detected, dinucleotides were the most frequent (77.08%), followed by tri- (14.58%) and tetranucleotides (7.47%). Penta- and hexanucleotide SSRs had a much lower frequency (0.65% and 0.22%, respectively) (Table 2). The result was in agreement with most of previous reports on aquatic animals, like Ictalurus punctatus, Mogurnda, Nannoperca, and so on.
In decreasing order, the 10 most frequently occurring microsatellites were AC, AG, AAT, AT, ATCT, ATG, AAC, AGG, CATT and TAC (Figure 1). The 10 most frequently occurring microsatellites comprised 93.84% of all microsatellites identified. AC is the most frequent motif in S. biddulphi, which is the same with I. punctatus, Fugu rubripes and Etheostoma okaloosae, but different from Crassostrea virginica (AG/CT)  and Argopecten irradians (TA) . The most motifs in aquatic animal are variable, however, GC dinucleotide repeats are extremely rare in all of the genomes studied [17,18], including aquatic animals [11,13,14]. Lower frequencies of CpG dinucleotides in vertebrate genomes have been attributed to methylation of cytosine, which, in turn, increases its chances of mutation to thymine by deamination .
2.3. SSR Polymorphism
In order to assess the potential use of newly developed microsatellites, 50 random loci were tested for polymorphism in 30 wild individuals of S. biddulphi. Of these, 38 loci were successfully PCR amplified and 29 loci were polymorphic across the panel of 30 individuals. The ratio of verified polymorphic markers was 58% in this study, which was higher than those of Gerris incognitus (43.5%) , Typha minima (56.7%)  and Galeorhinus galeus (40.6%)  sequenced by Roche 454. The numbers of alleles detected by the set of 29 polymorphic markers were in the range of 2 to 6 with an average of 3.5 alleles per locus (Table 4). Ho ranged from 0.15 to 0.83, and He ranged from 0.15 to 0.85. The genetic diversity of S. biddulphi in this study was much different from what Gong et al.  reported. The reasons maybe the different collected SSR loci or samples from different populations. The number of alleles is lower than many other freshwater fish species [15,24] and heterozygosity is mainly concentrating on the middle level. Considering the reduction of its populations, much more attention should be attracted to protect its genetic diversity. There was no evidence for null alleles found in these loci. Four pairs of loci (SCH6 and SCH8, SCH5 and SCH9, SCH5 and SCH10, SCH10 and SCH11) were found to be in linkage disequilibrium and nine of all the 30 loci were deviated from Hardy-Weinberg Equilibrium (p < 0.05) (Table 4). A possible explanation for the departure from HWE is the dramatic contemporary decline in spawning populations, and consequent non-random mating and genetic bottlenecks [1,2].
2.4. Cross-Amplification in S. irregularis
Cross-species amplification was conducted in S. irregularis. Out of the 29 SSRs primers tested, 20 (68.97%) were successfully amplified and 13 (44.82%) showed polymorphism in a pilot panel of six individuals in S. irregularis (Table 4). The allele number at these 13 loci was ranged from 2–4 with an average of 2.4 alleles per locus. These markers will be useful in Schizothorax genetic analysis.
3. Experimental Section
3.1. Sample and Genetic DNA Extraction
A total of 30 individuals of S. biddulphi and six individuals of S. irregularis were collected from Tarim River in Xinjiang Autonomous Region, China. Genomic DNA was extracted from alcohol-preserved caudal fin of these specimens by using Phenol/Chloroform procedure .
3.2. Ion Torrent PGM™ Library Preparation and Sequencing
An Ion Torrent adapter-ligated library was made following the manufacturer’s Ion Fragment Library Kit (Life Technologies, Invitrogen Division, Darmstadt, Germany) protocol (Part #4467320 Rev. A). Briefly, 50 ng genome DNA from one individual was end-repaired, and Ion Torrent adapters P1 and A were ligated using DNA ligase. Following AMPure bead (Beckman Coulter, Brea, CA, USA) purification, adapter-ligated products were nick-translated and PCR-amplified for a total of five cycles. The genome DNA library was purified using AMPure beads (Beckman Coulter) and the quantification, centration and size evaluated by the Agilent 2100 bioanalyzer (Agilent Technologies, Palo Alto, Calif.). Sample emulsion PCR, emulsion breaking, and enrichment were performed using the Ion Xpress Template Kit (Part #4467389 Rev. B), according to the manufacturer’s instructions. Briefly, an input concentration of one DNA template copy and Ion Sphere Particles (ISPs) was added to the emulsion PCR master mix and the emulsion generated using an IKA DT-20 mixer (Life Technologies, Invitrogen division, Darmstadt, Germany). Next, ISPs were recovered and template-positive ISPs enriched for using Dynabeads MyOne Streptavidin C1 beads (Life Technologies, Invitrogen division, Darmstadt, Germany). ISP enrichment was confirmed using the Qubit 2.0 fluorometer (Life Technologies, Invitrogen division, Darmstadt, Germany), and the sample was prepared for sequencing using the Ion Sequencing Kit protocol (Part #4467391 Rev. B). The complete sample was loaded on an Ion 318 chip and sequenced on the PGM™ for 260 cycles. The software CLC Genomics Workbench 5 was used to perform adaptor, poly-A tail trimming and also quality filtering (threshold quality score = 20). Then the reads were assembled to obtain the contigs using CLC Genomics Workbench 5, specifying a minimum read length of 40 nt, a minimum sequence overlap of 40 nt, and a minimum percentage overlap identity of 80%. The trimmed reads were submitted to NCBI Sequence Read Archive under the accession number of SRA059449.
3.3. Mining SSR Loci and Primer Design
The simple sequence repeat regions (SSR) were mined among the contigs using the BatchPrimer3 software , and the criterion was set for detection of di-, tri-, tetra-, penta- and hexa-nucleotide motifs with a minimum of 6, 5, 5, 5 and 5 repeats, respectively. Primers flanking of the microsatellites were designed using BatchPrimer 3 software and primer sequences for microsatellites are listed in Supplementary Table 1. According to Jurka’s  method with minor changes, SSR composed of tandemly repeated basic units 2–6 nt/bp long. As a result of theoretically possible, dinucleotide contains four kinds (AT, AG, AC and GC), trinucleotide contains 10 kinds (AAT, AAC, AAG, ATC, ACG, ACT, AGC, GCC, AGG and ACC), tetranucleotide contains 33 kinds, pentanucleotide contains 102 kinds and hexanucleotide contains 350 kinds.
3.4. PCR Amplification and Genotyping
Randomly, 50 microsatellites were selected to test the polymorphism. All SSR primer pairs were synthesized by Invitrogen Co. (Shanghai, China). The reagents for PCR amplification were bought from Tiangen Biotechnology Co. Ltd. (Beijing, China). All the amplifications were carried out in a 10 μL volume containing 1 μL 10× buffer (with Mg2+) for Taq DNA polymerase, 100 μM dNTP, 0.5 μL primer pairs, 1 U Taq DNA polymerase and 50 ng genomic DNA. The program of reaction was 5 min at 95 °C, followed by 30 cycles of 30 s at 94 °C, 30 s at optimized annealing temperature (Table 4), 30 s at 72 °C and a final extension at 72 °C for 8 min; at last storing at 4 °C. The PCR products were separated by 1% agarose gel electrophoresis with voltage of 90 V lasting about 20 min. PCR products were separated by electrophoresis on 8% non-denaturing polyacrylamide gels with voltage of 150 V lasting 2 h and visualized via silver-staining.
3.5. Data Analysis
The number of alleles (Na), the effective number of allels (Ne), expected (He) and observed heterozygosities (Ho) were calculated using POPGENE 32 software . Deviations from Hardy–Weinberg equilibrium (HWE) for each locus, linkage disequilibrium (LD) between all loci were tested by online version GENEPOP ( http://genepop.curtin.edu.au/) . All results were adjusted for multiple simultaneous comparisons using a sequential Bonferroni correction. The presence of null alleles was checked by MICRO-CHECKER version 2.2.3 software .
3.6. Microsatellite Markers Cross-Amplification in S. irregularis
To determine the potential for cross utility, amplification of the identified markers was assessed in one related species, S. irregularis, also an endangered freshwater fish species without effective molecular maker.
Taken together, our first experience with the use of Ion Torrent PGM™ for genome sequencing of fish was very positive with respect to speed, accuracy and cost. It proved that it is an efficient way to develop SSR markers with the application of PGM™, even though some items like read length and accuracy of assembly need to be improved. Additionally, much more attention should be attracted for the protection of the genetic diversity of this endangered fish species. The newly developed microsatellite markers would be useful for its further conservation genetic studies.
The authors thank Jianmin Wu and Abudu from the Administration Bureau of Kezier Reservoir for their kind help to collect the fish samples. This study was funded by Hong Kong Ocean Park Conservation Foundation (2011–2012) and National Natural Science Foundation of China (NO. 30960299 and NO. 31160526).
- Zhang, R.M.; Guo, Y.; Ma, Y.W. Turxun (2007) A survey on the resource and distribution of Schizothorax biddulphi Günther. Freshw. Fish 2007, 37, 76–78. [Google Scholar]
- Liu, X.J.; Hu, G.F. Threatened fishes of the world: Schizothorax (Schizopyge) biddulphi Günther, 1876 (Cyprinidae). Environ. Biol. Fish 2009, 85, 97–98. [Google Scholar]
- Yue, P.Q.; Chen, Y.Y. China Red Databook of Endangered Animals (Pisce.); Science Press: Beijing, China, 1998; pp. 153–155. [Google Scholar]
- Mariette, S.; Le Corre, V.; Austerlitz, F.; Kremer, A. Sampling within the genome for measuring within-population diversity: Tradeoffs between markers. Mol. Ecol 2002, 11, 1145–1156. [Google Scholar]
- Evanno, G.; Regnaut, S.; Goudet, J. Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Mol. Ecol 2005, 14, 2611–2620. [Google Scholar]
- Castoe, T.A.; Poole, A.W.; Gu, W.; Koning, A.P.J.D.; Daza, J.M.; Smith, E.N.; Pollock, D.D. Rapid identification of thousands of copperhead snake (Agkistrodon contortrix) microsatellite loci from modest amounts of 454 shotgun genome sequence. Mol. Ecol. Resour 2010, 10, 341–347. [Google Scholar]
- Abdelkrim, J.; Robertson, B.C.; Stanton, J.-A.L.; Gemmell, N.J. Fast, cost-effective development of species-specific microsatellite markers by genomic sequencing. Biotechniques 2009, 46, 185–192. [Google Scholar]
- Gardner, M.G.; Fitch, A.J.; Bertozzi, T.; Lowe, A.J. Rise of the machines—Recommendations for ecologists when using next generation sequencing for microsatellite development. Mol. Ecol. Resour 2011, 11, 1093–1101. [Google Scholar]
- Loman, N.J.; Misra, R.V.; Dallman, T.J.; Constantinidou, C.; Gharbia, S.E.; Wain, J.; Pallen, M.J. Performance comparison of benchtop high-throughput sequencing platforms. Nat. Biotechnol 2012, 30, 434–439. [Google Scholar]
- Whiteley, A.S.; Jenkins, S.; Waite, I.; Kresoje, N.; Payne, H.; Mullan, B.; Allcock, R.; O’Donnell, A. Microbial 16S rRNA Ion Tag and community metagenome sequencing using the Ion Torrent (PGM) Platform. J. Microbiol. Meth 2012, 91, 81–88. [Google Scholar]
- Somridhivej, B.; Wang, S.; Sha, Z.; Liu, H.; Quilang, J.; Xu, P.; Li, P.; Hu, Z.; Liu, Z. Characterization, polymorphism assessment, and database construction for microsatellites from BAC end sequences of channelcatfish (Ictalurus punctatus): A resource for integration of linkage and physical maps. Aquaculture 2008, 275, 76–80. [Google Scholar]
- Meglécz, E.; Nève, G.; Biffin, E.; Gardner, M.G. Breakdown of phylogenetic signal: A survey of microsatellite densities in 454 shotgun sequences from 154 non-model eukaryote species. PLoS One 2012, 7, e40861. [Google Scholar]
- Edwardsa, Y.J.K.; Elgara, G.; Clarka, M.S.; Bishop, M.J. The identification and characterization of microsatellites in the compact genome of the Japanese pufferfish, Fugu rubripes: Perspectives in functional and comparative genomic analyses. J. Mol. Biol 1998, 278, 843–854. [Google Scholar]
- Saarinen, E.V.; Austin, J.D. When technology meets conservation: Increased microsatellite marker production using 454 genome sequencing on the endangered okaloosa darter (Etheostoma okaloosae). J. Hered 2010, 101, 784–788. [Google Scholar]
- Wang, Y.; Guo, X. Development and Characterization of EST-SSR Markers in the Eastern Oyster Crassostrea virginica. Mar. Biotechnol 2007, 9, 500–511. [Google Scholar]
- Zhan, A.B.; Bao, Z.M.; Wang, X.L.; Hu, J.J. Microsatellite markers derived from bay scallop Argopecten irradians expressed sequence tags. Fish. Sci 2005, 71, 1341–1346. [Google Scholar]
- Katti, M.V.; Ranjekar, P.K.; Gupta, V.S. Differential distribution of simple sequence repeats in eukaryotic fenome sequences. Mol. Biol. Evol 2001, 18, 1161–1167. [Google Scholar]
- Tóth, G.; Gáspári, Z.; Jurka, J. Microsatellites in different eukaryotic genomes: Survey and analysis. Genome Res 2000, 10, 967–981. [Google Scholar]
- Schorderet, D.F.; Gartler, S.M. Analysis of CpG suppression in methylated and nonmethylated species. Proc. Natl. Acad. Sci. USA 1992, 89, 957–961. [Google Scholar]
- Perry, J.C.; Rowe, L. Rapid Microsatellite Development for Water Striders by Next-Generation Sequencing. J. Hered 2011, 102, 125–129. [Google Scholar]
- Csencsics, D.; Brodbeck, S.; Holderegger, R. Cost-effective, species-specific microsatellite development for the endangered Dwarf Bulrush (Typha minima) using next-generation sequencing technology. J. Hered 2010, 101, 789–793. [Google Scholar]
- Chabot, C.L.; Nigenda, S. Characterization of 13 microsatellite loci for the tope shark, Galeorhinus galeus, discovered with next-generation sequencing and their utility for eastern Pacific smooth-hound sharks (Mustelus). Conservation Genet. Resour 2011, 3, 553–555. [Google Scholar]
- Gong, X.; Cui, Z.; Wang, C. Isolation and characterization of polymorphic microsatellite loci from the endangered Tarim schizothoracin (Schizothorax biddulphi Günther). Conserv. Genet. Resour 2012, 4, 795–797. [Google Scholar]
- Liu, X.; Luo, W.; Zeng, C.; Wang, W.; Gao, Z. Isolation of new 40 microsatellite markers in mandarin fish (Siniperca chuatsi). Int. J. Mol. Sci 2011, 12, 4180–4189. [Google Scholar]
- Sambrook, J.; Russell, D.W. Molecular Cloning: A Laboratory Manual, 3rd ed; Cold Spring Harbor Laboratory Press: New York, NY, USA, 2002; pp. 479–483. [Google Scholar]
- You, F.M.; Huo, N.; Gu, Y.Q.; Luo, M.; Ma, Y.; Hane, D.; Lazo, G.R.; Dvorak, J.; Anderson, O.D. BatchPrimer3: A high throughput web application for PCR and sequencing primer design. BMC Bioinformatics 2008, 9, 253. [Google Scholar]
- Jurka, J.; Pethiyagoda, C. Simple repetitive DNA sequences from primates: Compilation and Analysis. J. Mol. Evol 1995, 40, 120–126. [Google Scholar]
- Yeh, F.C.; Yang, R.C.; Boyle, T. POPGENE, version 1.32; Molecular Biology and Biotechnology Centre, University of Alberta: Edmonton, AB, Canada, 1999. [Google Scholar]
- Raymond, M.; Rousset, F. GENEPOP (version 1.2): Population genetic software for exact tests and ecumenicism. J. Hered 1995, 86, 248–249. [Google Scholar]
- Van oosterhout, C.; Hutchinson, W.F.; Wills, D.P.M.; Shipley, P. MICRO-CHECKER: Software for identifying and correcting genotype errors in microsatellite data. Mol. Ecol. Notes 2004, 4, 535–538. [Google Scholar]
|Sequencing||Total number of bases (Mbp)||892.72|
|Total number of reads||3,476,226|
|Mean length of all reads (bp)||257|
|Longest read (bp)||399|
|Assembly||Total number of contigs after assembly||40,577|
|Mean length of contigs (bp)||395|
|Repeat||Number of loci identified||Percentage (%)||Frequency (%)||Mean distance (kb)|
Note: Frequency = SSR number/total number of non-redundant sequences; Mean distance = Total length of non-redundant sequences/total SSR number.
|Step||Run time (h)||Cost (dollar)||Instructures or softwares|
|Ion Torrent adapter-ligated library preparation||7–11||$100||Common molecular biology equipment|
|Sample emulsion PCR and enrichment||4–6.5||$150||One Touch V26|
|Sequencing with a 318 chip||4.5–5.5||$600||Ion Torrent V2.0|
|Sequence assembly||5–8||$100||CLC Genomics Workbench 5|
|SSR Mining and primer design||5–8||free||BatchPrimer 3 software|
|Locus/GenBank Accession No.||Primer sequence(5′→3′)||Ta (°C)||Repeat motif||Size range (bp)||Na||Ho||He||PHWE||Na in S. irregularis|
|SCH1/JX473024||F: GCCATCCTTCAGTTGTGTCT||62||(TATC)7||240–288||6||0.70||0.83||0.00 *||4|
|SCH2/JX473025||F: CTATGCTCGGTTTCTTTTCA||57||(CA)13||130–144||3||0.40||0.59||0.04 *||2|
|SCH5/JX473028||F: TGAAAGTTCCTTTGCTCCTG||52||(TG)10||188–234||3||0.70||0.62||0.00 *||2|
|SCH8/JX473031||F: AAGGTTGAACAGTTGTTTGC||54||(TTA)7||105–125||3||0.75||0.61||0.04 *||-|
|SCH13/JX473036||F: TTTCCCCTTAGTCATTTC||50||(AG)10||100–124||3||0.55||0.62||0.03 *||-|
|SCH16/JX473039||F: CACAGATAAGAACACGAAT||50||(CA)23||242–288||3||0.15||0.27||0.02 *||1|
|SCH18/JX473041||F: TCAATGAGCAACGAAAGAGC||52||(AGGCAG)5||136–176||4||0.75||0.63||0.00 *||1|
|SCH23/JX473046||F: TGACGGTAGAGTCCAGTG||50||(CAATTC)5||162–184||2||0.74||0.50||0.03 *||-|
Na: observed number of alleles per locus; Ho: observed heterozygosity; He: expected heterozygosity; PHWE: probability value by Markov chain method for the Hardy–Weinberg equilibrium;*denoted significant departure from HWE after Bonferroni correction (p < 0.05).
© 2012 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).