Development of Microsatellite Markers Using Pyrosequencing in Galium trifidum (Rubiaceae), a Rare Species in Central Europe

We identify a large number of microsatellites from Galium trfidum, a plant species considered rare and endangered in Central and Western Europe. Using a combination of a total enriched genomic library and small-scale 454 pyrosequencing, we determined 9755 contigs with a length of 100 to 6192 bp. Within this dataset, we identified 153 SSR motifs in 144 contigs. Here, we tested 14 microsatellite loci in 2 populations of G. trifidum. The number of alleles and expected heterozygosity were 1–8 (mean 3.2) and 0.00–0.876 (0.549 on average), respectively. The markers described in this study will be useful for evaluating genetic diversity within and between populations, and gene flow between G. trifidum populations. These markers could also be applied to investigate the biological aspects of G. trifidum, such as the population dynamics and clonal structure, and to develop effective conservation programs for the Central European populations of this species.


Introduction
Galium trifidum (Rubiaceae) is a long-lived, perennial plant species that can be further divided into four geographically distant subspecies. Three subspecies, G. trifidum ssp. columbianum (Rydb) Hult, G. trifidum ssp. subbiflorum (Wieg.) Puff and G. trifidum ssp. halophilum (Fern. et Wieg.) Puff, grow in North America and in the Pacific islands. The fourth typical subspecies, G. trifidum, can be found in the boreal zone of Europe, Asia and America, where both climate and habitat conditions are optimal for its growth [1,2]. In Central and Western Europe, G. trifidum is a rare and endangered species. A few scattered localities of G. trifidum have been reported from the Polish Lowland (four localities in the Mazury Region), high Austrian mountains-1700 m a.s.l. (four localities in the Styrian Alps), France (one locality in the Pyrenees) and Turkey (one locality in the Pontic Mountains) [3][4][5][6].
In Northern Europe, G. trifidum occurs along ecotones of different types of water bodies (lakes, rivers, streams) characterized by changeable physical and chemical properties of water. G. trifidum grows there in a wide range of habitats: abundant and highly dynamic populations can be found in those providing the best growth conditions, whereas the populations reported from less optimal habitats are smaller and less vigorous. In Central Europe, G. trifidum prefers eutrophic moist lowland bogs and the shores of high-mountain lakes [3,4]. Due to their ecotone character, the habitats of G. trifidum are affected by water level fluctuations which often exert an adverse effect on the species, leading to a decline in population size or to the disappearance of entire populations.
However, studies have been conducted to investigate the ecology and population structure of G. trifidum. Long-term protection and management plans aiming at preserving G. trifidum populations should involve habitat and environmental monitoring as well as quantification of genetic diversity within and among populations. Microsatellite markers, also called simple sequence reprat (SSR) markers, are widely used in ecological studies and can also be used for investigating the genetic diversity of populations. The popularity of those markers stems from their near ubiquity, a high level of polymorphism, codominance and multi-allelic variation [7]. SSRmarkers can also be used for investigating the genetic diversity of Galium trifidum. In this study, we developed nuclear microsatellite markers for G. trifidum by using GS Junior next generation sequencing (Roche 454 Life Sciences, Branford, CT, USA). In comparison with the conventional method Sanger sequencing, pyrosequencing supports the acquisition of larger amounts of data within a shorter time.
High-throughput next generation sequencing enables the identification of even several hundred polymorphic loci in a single run [8]. To date, this technique has been successfully applied in animal, plant and bryophyte researches [9][10][11][12][13][14][15]. Fourteen microsatellite markers developed for G. trifidum are characterized in this paper. The polymorphism of SSR markers was tested in two G. trifidum populations of 19 and 20 individuals, respectively.

Results and Discussion
We tested PCR amplification and the level of polymorphism of the designed SSR motifs. The sequences of the SSR fragments were deposited in the GenBank (accession numbers from JX273032 to JX273045).
A single run of Galium trifidum DNA library sequencing in the GS Junior pyrosequencing system resulted in 144,275 reads with an average read length of 426 bp. Sequence assembling and mapping to the chloroplast genome of Coffea arabica allowed the alignment of 1708 reads and their contigs to the reference genome. The Galium trifidum sequences obtained in the analysis covered the chloroplast genome of C. arabica in approximately 72%, at an average depth of 2.8. The remaining reads were de novo assembled into 9755 contigs with a length of 100 to 6192 bp.
Analysis of the obtained sequences with the msatcommandersoftware identified 153 SSR motifs in 144 contigs. Di-(107) and tri-nucleotide (35) repeats dominated among the discovered microsatellite motifs. Longer repeat motifs included seven tetra-, three penta-and two hexa-nucleotide motifs.
Among identified SSR motifs we designed primers for 60 of them. The 14 microsatellite loci identified in the study showed a clear, single peak for each allele These 14 loci were subsequently used to screen 39 individuals collected from two populations of G. trifidum, one from northern Finland and one from Austria. In the studied populations, 12 loci showed polymorphism, while two loci (Gal10, Gal 12) were monomorphic (Table 1). The number of alleles per locus ranged from one to eight, with an average of 2.3 in the Austrian population and 2.6 in the Finnish population. The expected (H E ) heterozygosities ranged from 0.000 to 0.876 (0.549 on average) ( Table 2). The values of coefficient H E were similar in the Austrian and Finnish populations, reaching H E = 0.438 and H E = 0.431 respectively. Significant deviations (p < 0.05) from Hardy-Weinberg equilibrium (HWE) were detected for locus Gal02 and Gal07 in the Finnish population, which suggests the presence of null alleles. Significant LD were noted between three pairs of loci: Gal02/Gal13, Gal04/Gal14 and Gal07/Gal11.

DNA Extraction
Total genomic DNA was extracted from the leaf tissue of 20 and 19 individuals from one Finnish and one Austrian population, respectively using the DNeasy ® Plant Mini Kit (Qiagen, Hilden, Germany). Stems were ground with silica beads in a MiniBead-Beater tissue disruptor for 50 s, and they were subsequently processed using the manufacturer's protocols. DNA quantity was estimated with the Qubit fluorometer system (Invitrogen, Carlsbad, NM, USA) using the Quant-IT ds-DNA BR Assay Kit (Invitrogen).

DNA Library Preparation and Sequencing
Eight hundred nanograms of DNA was sheared by nebulization, purified with the MinElute PCR Purification Kit (Qiagen), and subsequently processed according to the GS Rapid Library Preparation Kit Method Manual (Roche/454 Life Sciences). The quality of the DNA library was assessed by gel electrophoresis in the FlashGel System (Lonza). DNA fragments were clonally amplified using the GS Junior Titanium emPCR Lib-L Kit (Roche/454 Life Sciences). Sequencing was performed using the GS Junior pyrosequencing system, according to the Sequencing Method Manual (Roche/454 Life Sciences).
Pyrosequencing data were assembled using GS Reference Mapper software (Roche/454 Life Sciences). A two-step assembly was performed. First, the obtained sequences were assembled using the chloroplast genome of Coffea arabica (GenBank: NC008535) to separate chloroplast reads from nuclear reads. The remaining reads were assembled using the GS Newbler de novo assembler (Roche/454 Life Sciences).
The obtained contigs were searched for microsatellite motifs using MSTATCOMMANDER with default settings [16]. This program was also used for primer design. To avoid designing primers for any potential SSR locus twice, the contigs containing the same motif were compared in Bioedit 7.0.5 [17].
Randomly selected 14 amplicons were resequenced using amplification primers. Purified PCR products were sequenced in both directions using the ABI BigDye 1.1 Terminator Cycle Kit (Applied Biosystems, Foster City, CA, USA), and were visualized using an ABI Prism 3130 Automated DNA Sequencer (Applied Biosystems).

Data Analysis
Genetic diversity measures were estimated using GenAIEx 6.41 software [19]. Deviations from the Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium between loci were tested using FSTAT software version 2.9.3 [20]. Significance levels were adjusted using Bonferroni correction for multiple testing.

Conclusions
The 14 microsatellite loci described here will be useful for evaluating genetic diversity within and between populations, and gene flow between G. trifidum populations. These markers could also be applied to investigate the biological aspects of G. trifidum, such as the population dynamics and clonal structure, and to develop effective conservation programs for the Central European populations of this species.