Development and Characterization of 18 Novel EST-SSRs from the Western Flower Thrips, Frankliniella occidentalis (Pergande)

The western flower thrips, Frankliniella occidentalis (Pergande), is an invasive species and the most economically important pest within the insect order Thysanoptera. For a better understanding of the genetic makeup and migration patterns of F. occidentalis throughout the world, we characterized 18 novel polymorphic EST-derived microsatellites. The mutational mechanism of these EST-SSRs was also investigated to facilitate the selection of appropriate combinations of markers for population genetic studies. Genetic diversity of these novel markers was assessed in 96 individuals from three populations in China (Harbin, Dali, and Guiyang). The results showed that all these 18 loci were highly polymorphic; the number of alleles ranged from 2 to 15, with an average of 5.50 alleles per locus. The observed (HO) and expected (HE) heterozygosities ranged from 0.072 to 0.707 and 0.089 to 0.851, respectively. Furthermore, only two locus/population combinations (WFT144 in Dali and WFT50 in Guiyang) significantly deviated from Hardy–Weinberg equilibrium (HWE). Pairwise FST analysis showed a low but significant differentiation (0.026 < FST < 0.032) among all three pairwise population comparisons. Sequence analysis of alleles per locus revealed a complex mutational pattern of these EST-SSRs. Thus, these EST-SSRs are useful markers but greater attention should be paid to the mutational characteristics of these microsatellites when they are used in population genetic studies.


Introduction
The western flower thrips, Frankliniella occidentalis (Pergande), is the most economically important pest within the insect order Thysanoptera, which includes more than 5500 described species [1].
F. occidentalis causes enormous damage by directly feeding on greenhouse vegetable and ornamental crops and by transmitting plant-pathogenic tospoviruses [2].F. occidentalis is endemic to North America in an area west of the Rocky Mountains from Mexico to Alaska [3].Since the late 1970s, F. occidentalis has rapidly invaded most countries throughout the world where it not only causes severe economic losses but also threatens endemic invertebrates and associated ecosystems [4].In order to control F. occidentalis, it is first necessary to know its genetic diversity, population structure and invasion history.Genetic tools, such as microsatellites markers, can reveal the origin of newly established populations, their genetic makeup and their routes of migration [5,6].
Microsatellites, or simple sequence repeats (SSRs), consist of tandemly repeated motifs that are 1-6 bp in length, and they are widely distributed throughout the eukaryotic genomes [7].Conventionally, two models of mutations have been considered for microsatellites, the stepwise mutational model (SMM) and the infinite allele model (IAM).The SMM states that all mutational events involve a change in a single repeat only.The IAM assumes that every mutation results in the creation of a new allele [8].The mutational mechanism of microsatellites is still under debate though it appears most likely to be slippage events during DNA replication [9].Several other mechanisms may also be responsible for the generation of new alleles, e.g., insertions/deletions (indels) in the flanking region [10].Matsuoka showed that the IAM model was appropriate for maize microsatellites mutated in the flanking regions [10].Knowledge of the mutational pattern of one specific SSR could facilitate the selection of appropriate mutation model and combinations of markers in the population genetic studies.Currently, due to their codominant inheritance, highly polymorphic, easy detection by polymerase chain reaction (PCR) and broad distribution in the genome, microsatellites/SSRs are widely used for population genetic studies [11].Ascunce et al. have used a large number of SSRs to investigate the global invasion route of the fire ant Solenopsis invicta [6].However, population genetic studies of F. occidentalis have been hampered by a lack of polymorphic molecular markers.Presently, only 6 polymorphic microsatellites of F. occidentalis are known [12].Recently, an enormous number of ESTs (expressed sequence tags) of F. occidentalis have become available in the public sequence database [13], and can be exploited to identify markers inexpensively.Hence, we isolated and characterized 18 novel EST-SSRs for F. occidentalis.These EST-SSRs will allow researchers to investigate the genetic diversity and population genetic structure of F. occidentalis in its native and invasive range and trace its global invasion history.

Characteristics of F. occidentalis EST-SSRs
We obtained 309 sequences containing SSRs by MIcroSAtellite (MISA) [14] analysis.Among these sequences, five contained two different SSRs and three of these were compound microsatellites (Table 1).The EST-SSR frequency (1SSR/24.1 kb) of F. occidentalis was smaller than that of brown planthopper (1SSR/13.0kb; [15]), pea aphid (1SSR/3.0kb; [16]) and several other insects (~1SSR/1 kb in fly, silkworm and mosquito; [17]) and was comparable to some crops (1SSR/23.80kb in soybean and 1SSR /28.32 kb in maize; [18]).The most abundant repeat motif class was dinucleotide repeats (DNRs, 265/314).The AC/GT (41.5%) motif was the most common among DNRs, followed by AG/CT (31.7%),AT/AT (22.3%) and CG/CG (4.5%).The classification of repeats into classes was carried out according to the method of Jurka and Pethiyagoda [19].For example, (AC)n, (CA)n, (TG)n and (GT)n were considered as the same class considering complementary sequences and/or different reading frames.Other repeat motifs, including trinucleotide, tetranucleotide, pentanucleotide, were also observed, albeit infrequently (49/314) (Table 1).Of the primer pairs designed from 122 sequences suitable for primer design, 72 amplified the expected products, 50 yielded larger or no products.Finally, 18 primer pairs revealed polymorphism (Table 2), the remaining (54) were either monomorphic or amplified poorly.For these 18 selected ESTs, homology searches with the BLASTX against the NCBI nr database found three ESTs with significant hits to insect genes at an E-value cutoff level of 1e-5 (Table 2).No hit was found for any of the other 15 ESTs.Sequence length variation in coding sequences is rare.Examination of the three sequences with significant blast hits suggested that SSR sequences may be located on either 5′-or 3′-UTR (untranslated regions).The 15 remaining ESTs with unknown function may also originate from non-coding regions.These selected ESTs seem unlikely to come from non-insect sources because of the high amplification rates (approximately 99.5%) of these EST-SSRs across the 96 F. occidentalis samples.When analyzed all the published F. occidentalis ESTs, 17 of the 18 selected ESTs were singletons, the remaining one (GT306150) was part of a larger contig which contained only two ESTs.The low copy number of these ESTs suggested that they seem unlikely to be repetitive sequences in the nuclear genome.When considering all three populations (Table 3), 99 alleles were identified from 18 markers, the number of alleles (Na) ranged from 2 to 15, with an average of 5.50 alleles per locus.The observed (H O ) and expected (H E ) heterozygosities ranged from 0.072 to 0.707 and 0.089 to 0.851, respectively.The PIC values ranged from 0.088 to 0.860, with an average of 0.476 (Table 2).Nb samples, number of samples.
After sequential Bonferroni correction for multiple tests, only WFT144 in Dali and WFT50 in Guiyang significantly deviated from Hardy-Weinberg equilibrium (HWE), possibly due to the presence of null alleles, which was further confirmed by the MICRO-CHECKER [20] analysis (Table 4).In addition, no band-stuttering, large allele dropouts or significant genotypic linkage disequilibrium was detected.Genetic diversity analysis indicated that Dali displayed the highest number of alleles (Na = 4.944) and expected heterozygosity (H E = 0.522) and Harbin the lowest (Na = 4.389; 4).

Mutations of EST-SSRs
Ninety-five different alleles, whose allele frequency was approximately 98%, were successfully sequenced.All the sequences obtained corresponded exactly to the expected EST sequences.The other 4 rare alleles with a low frequency were not sequenced.Sequence analysis of these alleles revealed that three types of mutational events are responsible for the generation of new alleles (Six loci exhibiting all three mutation patterns are listed in Figure 1, the other 12 are shown in Figures S1 and S2).First, size variation of sequenced alleles was explained by the differences in the numbers of repeat motifs for 7 microsatellites (Figure 1A, Figure S1).Second, at the WFT51, WFT83, WFT108 and WFT124 loci, two different repeat motifs, including one in the flanking region, were found, both contributing to the allele-size variation (Figure 1B, Figure S2A).Third, indels in the flanking region were observed in 7 loci (WFT20, WFT66, WFT87, WFT104, WFT139, WFT141 and WFT144; (Figure 1C, Figure S2B)), but the frequencies of these alleles were very low in four loci (WFT20: 0.088; WFT66: 0.088; WFT87: 0.144 and WFT139: 0.021).Besides the three mutation patterns mentioned above, base substitutions in the repeat or the flanking region were also observed in 9 and 9 loci respectively.They did not contribute to the length changes of the microsatellites.In addition, several loci had multiple mutation types mentioned above, e.g., WFT37 contained both base substitutions in the flanking region and step-wise mutation in the repeat region; WFT104 had both indels in the flanking regions and step-wise mutation in the repeat motif.A minimum number of contiguous repeats might be necessary for slippage to occur.These have been suggested to be four in di-nucleotide repeats and two in tri-and tetra-nucleotide repeats [21,22].Using these criteria, we calculated the frequency of slippage consistent and inconsistent mutations.The allele size variation mainly came from the slippage at the repeat motifs.Sixty-six alleles with a frequency of 75.5% from the 18 loci possessed this mutation mechanism.Twenty-two alleles from 7 loci showed slippage in the flanking region, with the frequency of 19.9%.In addition, mutation mechanisms other than slippage also occurred in our microsatellites, with the frequencies of 12.6% (20 alleles from 6 loci) in the repeat motifs and 8.4% (11 alleles from 7 loci) in the flanking region.Generally speaking, slippage in the repeat motif and flanking region was the main mutation mechanism for the newly developed microsatellites.
Length changes in microsatellite DNA are generally thought to arise from replication slippage [9].However, a complex mutational pattern of F. occidentalis EST-SSRs was observed in this study.These mutational patterns (changes in the number of microsatellite repeat units, base substitutions and indels within flanking region) were also found in microsatellites of insects [15,23] and other organisms, including the maize [24] and birds [25].It seems that the complex mutational pattern is common in the eukaryotic genomes.Zhu et al. showed that indel slippage or length independent slippage tended to duplicate short sequences [26].The number of repeat motifs of F. occidentalis ESTs was low (n < 9; Table 1) suggesting that indel slippage may be responsible for the complex mutational pattern of EST-SSRs in F. occidentalis.Global and pairwise F ST and R ST among three populations were then calculated.F ST assumes an infinite allele model and R ST assumes a stepwise mutation model [27].Global F ST and R ST considering all 18 loci showed a low but significant differentiation (global F ST = 0.029, P < 0.001; global R ST = 0.023, P < 0.001) among all three populations (Table 5).Moreover, including the loci which have one SSR or (and) indels in the flanking region did not significantly change the global F ST and R ST values with overlapping 95% confidence intervals (Table 5).When considering the same loci combinations, the global and pairwise F ST and R ST values did not differ significantly from each other with overlapping 95% confidence intervals (Table 5).However, no clear correlation was found between pairwise estimate of F ST and R ST (Spearman r = −0.202;P = 0.264).Pairwise F ST results are quite consistent in all cases, Dali/Guiyang exhibited the lowest differentiation estimates and Harbin/Guiyang exhibited the highest.However, they were not reflected in the pairwise R ST results.This might be due to the fact that the microsatellites mutated in the flanking region did not strictly conform to IAM and/or SMM model(s).Eleven out of 18 loci showed multiple sources of length variation which cannot be explained solely by gain or loss of one or two repeats as in the case of SMM based models.Thus, methods based on the IAM might be appropriate for many loci in our study, although they were not supported by our analysis.Anderson also suggested that IAM was more appropriate for one parasite's (Plasmodium falciparum) microsatellites which have complex mutation patterns [28].Furthermore, the precision of global differentiation estimates improves (the confidence intervals narrows) with increasing numbers of loci analyzed (Table 5).Thus, if users of the described microsatellites want precision in their estimates, more loci should be used.

EST Database Mining
13,839 F. occidentalis EST sequences were obtained from GenBank [29].EST-trimmer [30] was then used to remove poly (A/T) stretches from the 5′or 3′ ends until there were no (A)5 or (T)5 within the range of 50 bp.EST sequences shorter than 100 bp were excluded and those longer than 700 bp were clipped at their 5′ end to preclude the inclusion of low-quality sequences [31].Those obtained sequences were screened for microsatellites containing at least five di-, five tri-, four tetra-, four pentaand four hexa-nucleotide repeats using the software MISA [14].PCR primers flanking the microsatellite repeats were designed using Primer Premier 5.0 [32].The selected ESTs were compared to the NCBI nr protein database using the BLASTX program.A suggested cut-off value of 1e-5 was chosen to assign a potential homologue for each EST sequence [13].

Sample Collection and DNA Extraction
In total, 96 F. occidentalis female adults were sampled representative of 3 sites in China during July 2010 to July 2011 (Table 3).Total genomic DNA was extracted by homogenizing a single female adult in a 50 µL mixture of STE buffer (100 mM NaCl, 10 mM Tris-HCl, 1 mM EDTA, pH 8.0) in a 1.5 mL Eppendorf tube.The mixture was incubated with 2 µL proteinase K (10 mg/mL) at 37 °C for 30 min, followed by 5 min at 95 °C.The samples were centrifuged briefly, and used immediately or stored at −20 °C for the PCR reactions.

Primer Testing
The forward primer of each set was tailed with U19 (GGTTTTCCCAGTCACGACG) to facilitate labeling.PCR amplifications were performed on an Applied Biosystems VeritiTM Thermal Cycler (Applied Biosystems).Each 10 μL amplification mixture contained 1 × PCR buffer, 0.2 mM of each dNTP, ~50 ng of DNA, 0.25 units of Maxima Hot Start Taq DNA polymerase (Fermentas, Canada), 0.04 μM of each forward primer, 0.2 μM of each of the reverse primer and the dye-labeled U19 primer (FAM, VIC, NED or PET).These cycling conditions were an initial denaturing for 4 min at 95 °C; 10 cycles of 95 °C for 30 s, 51 °C for 30 s, 72 °C for 30 s; 25 cycles of 95 °C for 30 s, 54 °C for 30 s, 72 °C for 30 s, and a final extension at 72 °C for 10 min.PCR products were run on the ABI 3130 capillary sequencer along with the GeneScan-500 LIZ size standard and allele sizes were determined using GENEMAPPER version 4.0 (Applied Biosystems).

Allele Sequencing
Different alleles per locus detected by the capillary sequencer were amplified using a 50 μL PCR reaction with non-fluorescent labeling primers (conditions as above with a specific anneal temperature at 52 °C).The purified PCR products (purified using Axygen cleanup kit) were subsequently ligated into the pGEM-T vector (Promega) and introduced into Escherichia coli DH5α cells.Six positive clones for each allele were sequenced to exclude PCR artefacts.Alignments of the sequenced alleles were generated using the Clustal X 2.0.11 program [33].Several loci were then manually aligned using BioEdit 7.0.4[34].

Data Analysis
All genetic statistics were carried out based on the genotyping data from three populations.MICRO-CHECKER 2.2.3 was used to detect genotyping errors due to null alleles, stuttering, or allele dropout using 1000 randomizations [20].The program Genepop 4.0.10 [35] was used to test for linkage disequilibrium between pairs of loci in each population (100 batches, 1000 iterations per batch) and for deviations from Hardy-Weinberg equilibrium (HWE) at each locus/population combination using Fisher's exact tests.The population genetic diversity indices such as total alleles per locus (N A ), observed heterozygosity (H O ), expected heterozygosity (H E ) and mean number of alleles (Na) was assessed using GenAlEx 6.41 [36].We also calculated the polymorphism information content (PIC) using CERVUS version 3.0 [37].Pairwise F ST and R ST value and their significance for each population comparison were calculated with 10,000 permutations in Arlequin 3.0 [38] and RST CALC 2.2 [27], respectively.

Conclusions
In summary, 18 highly polymorphic EST-SSRs have been specifically developed for F. occidentalis in this study.Sequence analysis of alleles per locus revealed a complex mutational pattern of these EST-SSRs.Thus, these EST-SSRs are useful markers for the invasive species F. occidentalis but greater attention should be paid to the mutational characteristics of these markers when they are used in population genetic studies.

Figure 1 .
Figure 1.Mutational patterns of EST-SSRs in Frankliniella occidentalis.(A) microsatellites mutated in the repeat motif; (B) microsatellites which have one SSR in the flanking region; (C) microsatellites which have indels in the flanking region.Each base is indicated by a different color.The repeat motifs are shown in black box.The red box indicates the repeat motifs in the flanking region and the blue one indicates the indels in the flanking region.

a including the 7
loci mutated in the repeat motif; b including the 7 loci mutated in the repeat motif and 4 loci which have one SSR in the flanking region; c including the 7 loci mutated in the repeat motif and 7 loci which have indels in the flanking region; Bold indicates significant after Bonferroni correction (P = 0.05); Values in parentheses indicate 95% confidence intervals.

Table 1 .
Frequency and distribution of SSRs in the analyzed Frankliniella occidentalis ESTs.
Ta, annealing temperature; N: number of analyzed individuals; Na: number of alleles detected; H O : observed heterozygosity; H E : expected heterozygosity; PIC: polymorphism information content; a Based on BLASTX analysis.The species source and Accession No. of the best hit(s) is indicated, together with the E-value for the match.

Table 3 .
Collection information for samples used in this study.
N: number of analyzed individuals; Na: number of alleles detected; H O : observed heterozygosity; H E : expected heterozygosity; r: null allele frequency; Bold indicates deviations from Hardy-Weinberg equilibrium after sequential Bonferroni correction for multiple tests (P < 0.05).

Table 5 .
Pairwise F ST (below the diagonal) and R ST (above the diagonal) matrix using different combinations of EST-SSRs of Frankliniella occidentalis.