High-Throughput SSR Marker Development and the Analysis of Genetic Diversity in Capsicum frutescens

: Capsicum frutescens , one of the domesticated species of pepper grown worldwide, is thought to be highly advantageous due to its strong resistance against plant pathogenesis, high productivity, and intense aroma. However, a shortage of molecular markers limits the efﬁciency and accuracy of genetic breeding for pepper. With the newly developed next-generation sequencing technology, genome sequences of C. frutescens can be generated, which are now available for identifying SSR markers via data mining. In this study, a total of 278,425 SSRs were detected from the pepper genome using MISA software. It was observed that trinucleotides were the dominant repeat motif. This was followed by dinucleotides, tetranucleotides, pentanucleotides, and the hexanucleotides repeat types. (AT)n (TTG)n (AAAT)n (AAATA)n (TATAGA)n is known to be the most common repeat motifs corresponding to dinucleotide to hexanucleotide repeats, respectively. In addition, a total of 240 SSR primers evenly distributed over all 12 chromosomes were designed and screened against 8 C. frutescens cultivars. Of these, 33 SSR markers that have high polymorphism, have been scrutinized for 147 accessions from 25 countries. The dendrogram constructed clustered these accessions into seven major groups. The groups were found to be consistent with their origins. The results obtained in this study provided resources of SSR molecular markers and insight into genetic diversity of the C. frutescens


Introduction
Pepper is an indispensable spice, as well as an important vegetable crop which is cultivated around the globe. It originated in South America and belongs to the genus Capsicum (Solanaceae) [1][2][3]. This genus has many varieties of cultivated and wild species. However, only five of the species are commonly cited in the current literature as domesticated and culinary species. These include C. annuum, C. chinense, C. frutescens, C. baccatum, and C. pubescens [4]. Among those, C. annuum is considered to be the predominant species, which is comprised of many commercial varieties with major variations in the size, shape, color of the fruit, and, in particular, the pungency. However, after a long period of artificial selection, continuous cultivation, and domestication, the characteristic performance of pepper has tended to become diversified with narrowing genetic backgrounds. Consequently, the reduced genetic diversity index has engendered a straggle in the production of the piquant/hot pepper varieties. This has entailed searching and restoring potent traits classifying them into seven groups and validating the findings, the results were in exact congruence with the basic classification of the pepper species.
Pepper germplasm resources are the prime material basis for the breeding and production of pepper crops. However, the genetic background of the pepper germplasm resources in China is relatively narrow. Therefore, it is indicated that an emphasis should be placed on gaining significant data resources with efficient utilization values from the collection and proper exploitation of wild germplasm resources. Among the available resources, 'Xiaomijiao' is the only wild C. frutescens plant found in China. As a result, this research focused on the objectives of studying the development of SSR molecular markers for C. frutescens on the basis of the 'Xiaomijiao' sequence data (unpublished), and then analyzed the genetic diversity of the germplasm resources collected from the genome levels through SSR markers. These new polymorphic microsatellite markers provide the basis for further population research. In this paper, the genetic diversity of 147 C. frutescens germplasm from 25 countries was analyzed to understand the genetic relationships and genetic composition of various accessions, so as to provide basis for more effective utilization of these germplasm resources in the future.

Plant Materials and DNA Extraction
A total number of 147 pepper (C. frutescens) accessions, which had been collected from 25 countries, were used in this study. The details of the samples encompassing their English names, cultivation regions, DNA concentrations (ng/uL), and the type of selected species (wild/cultivated) are summarized in Supplementary File S1. Following the sampling processes, young leaves were immediately frozen under liquid nitrogen and transferred to −80 • C conditions for future DNA extraction. Then, following the process described by Murray and Thompson [34], CTAB (Cetyl Trimethyl Ammonium Bromide) methods were used for genomic DNA extraction. In addition, the quality of isolated DNA was ascertained using an NanoDropOneC Microvolume UV-Vis Spectrophotometer. Finally, the DNA concentrations were adjusted to 10 to 35 ng/µL for use in the subsequent polymerase chain reactions (PCR).

Source of Genic Sequences, SSR Identification and Primer Design
In this study, the wild pepper 'Xiaomijiao' (C. frutescens) was sequenced at Beijing Nuohe Zhiyuan Technology Co., Ltd. The genome sequencing of 'Xiaomijiao' (C. frutescens) was performed with Illumina HiSeq4000 (300x coverage) and PacBio Sequel (30x coverage). The assembled sequences, which totaled 2.95 G bases, were used in this study to characterize the distribution of microsatellites in the pepper genome. The completeness of the 2.95 Gbp assembly is supported by the mapping of over 99% of~3 million EST reads (generated using HiSeq4000 technology) from 'Xiaomijiao' (C. frutescens) leaf, stem and root tissues. A MISA (MIcroSAtellite) SSR identification tool program was employed for the sequence identification [35]. It was confirmed that 2 to 6 nucleotide motifs could be considered for identifying the presence of microsatellites. The minimum repeating units for the dinucleotides, trinucleotides, tetranucleotides, pentanucleotides, and hexanucleotides were defined as 6, 5, 4, 4, and 4, respectively. We allowed up to 5 nucleotide mismatches at the 5' end of the primer, but no mismatches at the 3' end, and a minimum of 80% overall match homology. For a given primer pair, we considered that a specific amplicon was generated if both forward and reverse primers were mapped to the same chromosomes/scaffold. Then, based on the MISA results, Primer 5 software was employed for designing the SSR primers. They generated amplicon sizes of 100 to 300 bp with the following criteria: 22 to 25 bp lengths with 40 to 70% GC content levels; 45 to 65 • C melting temperature (Tm); and the remaining parameters used the program's default values. Eight pepper cultivars, referred to as GRIF 9194, GRIF 9316, PI 439309, PI 439489, PI 631142, VI029462, VI062180, and LJ091, respectively, were selected for validating the primers via PCR and electrophoretic techniques. A total number of 240 SSR markers, which were distributed among 12 linkage groups, were screened. Finally, 33 markers were identified as being evenly distributed along the linkage groups, which produced clear bands with high polymorphism. These markers were further used to analyze all of the accessions, as detailed in Table 1.

PCR Amplification and Polyacrylamide Gel Electrophoresis
PCR were carried out in the final volume of the 10 µL reaction mixture (1 µL DNA, 1 µL forward primer, 1 µL reverse primer (100 ng/µL), 5 µL of 2 × T5 Super PCR Mix, and 2 µL nuclease-free water) on a thermocycler (Applied Biosystems, StepOnePlus ABI7500) with the following reaction conditions: initial denaturation at 94 • C for 3 min, then 30 cycles at 94 • C for 30 s; annealing at 55 • C for 30 s, extension at 72 • C for 1 min; and final extension at 72 • C for 5 min. Electrophoretic analysis was completed in order to assess the PCR amplicons, employing 8% polyacrylamide gel and 0.5 × TBE buffer at a constant voltage of 180 V, 150 mA, and 50 W for three hours, along with a 100 bp DNA ladder. When the electrophoresis was completed, the gel was carefully retrieved, rinsed with sterile water, and kept incubated with 1% silver nitrate (AgNO 3 ) (1 L) for 20 min under shaking conditions. Following the incubation, the gel was washed 3 times with sterile water and immersed in 1 L of developing solution (1.5% sodium hydroxide (NaOH) and 4 mL formaldehydes (CHHO)) until the bands were clearly visible (approximately 5 min). Then, based on the number of clearly visible bands, the alleles in each pepper variety were visually determined.

Data Statistics and Analysis Results
The scoring was given as 1 (presence) and 0 (absence) for the amplified fragments in each microsatellite loci, and data matrixes were constructed accordingly. Then, employing Popgen (version 1.32) software, several indices were calculated, such as the observed number of alleles (Na), effective number of alleles (Ne), observed heterozygosity (Ho), expected heterozygosity (He), and the Shannon information index (I) [36]. The major allele frequency (MAF), polymorphism information content (PIC), and gene diversity index were calculated using PowerMarker (version 3.0) software [37]. In addition, the cluster analysis of the germplasms was based on the Nei genetic distance [38] and a neighbor joining (NJ) method was used to construct a dendrogram via PowerMarker (version 3.0) software. The dendrogram tree was visualized and edited using MEGA7 (version 7.0) [39].

Distribution of the SSRs in the Capsicum Frutescens Genome
The search results of the genome sequences of C. frutescens resulted in a total number of 278,425 SSR loci being identified. (AT)n(TTG)n(AAAT)n(AAATA)n(TATAGA)n is the most common repeat motifs corresponding to the dinucleotide to hexanucleotide repeat, respectively. The SSR repeat types were found to be different. For example, the dominant amongst the 1638 SSR repeats were the trinucleotides and dinucleotides, which accounted for 57.1% (158,967) and 34.1% (94,916), respectively. The remaining was occupied by tetranucleotide, pentanucleotide, and hexanucleotide repeat motifs, accounting for 5.9% (16,351), 1.6% (4531), and 1.3% (3660), respectively. Taken together, it was found that the majority of the SSR repeat motifs along the entirety of genome sequences were trinucleotide repeats, and the hexanucleotide repeats were the fewest, as illustrated in Figure 1A. The frequencies of each of SSR motif types along the entire C. frutescens genome were also detected. Among the dinucleotide motifs, AT/TA was observed to be the most common (69.91%; 66,356). This was followed by AC/GT (9.14%; 8671) and TC/GA (7.88%; 7476). Meanwhile, the CG/GC motif repeats were rarely observed (0.05%). The trinucleotide repeat motif consisted of 30 different types. The predominant motifs were TTG/CAA and AAT/ATT, which accounted for 12.41 and 10.02%, respectively ( Figure 1B). In addition, AAAT/ATTT (13.85%) were the predominantly found tetranucleotide repeats ( Figure 1C).
The statistical data showed that the SSR loci were widely distributed on all 12 chromosomes of the C. frutescens genome. These were mainly found in Chr3 (25, In addition, 16,522 SSR loci were unable to allocate in the chromosomes. It was found that, while analyzing the distribution frequency of the SSR loci/Mb, the results revealed that the number of SSR loci/Mb on each chromosome ranged from 91.27 to 104.13. However, the number of SSR loci on Chr3 was the highest. The density of the SSR loci on Chr3 was found to be the third highest, with an average of 97.45 SSR loci/Mb. Moreover, although the number of SSR loci on Chr2 was found to the fewest, the density of the SSR loci was the highest overall (104.13 SSR loci/Mb), as shown in Figure 1D.

Analysis of the SSR Repeat Motif Types and Frequencies
As shown in Figure 1E, the distributions of the SSRs were also examined from the aspect of the number of repeat units. It was observed that for all the SSR types, the SSR frequencies decreased as the number of repeat units increased. Meanwhile, the change rates became more gradual for the dinucleotides when compared with the longer repeat motif types. The dominant numbers of repeats in the pepper SSR loci ranged between 4 and 10, with the exception of a few (more than 10). At the same time, the majority of the observed repeat times were 6, accounting for 26.20% (45,076). The dinucleotides were found to be the most abundant number of repeats, accounting for 55.16% (94,916), whereas the number of repeats for remaining trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide were determined to be 30.58% (52,614), 9.50% (16,351), 2.63% (4531), and 2.13% (3660), respectively, as shown in Figure 1E.

Analysis of the SSR Repeat Motif Types and Frequencies
As shown in Figure 1E, the distributions of the SSRs were also examined from t aspect of the number of repeat units. It was observed that for all the SSR types, the SS

Primer Design of the Pepper Plant Genomic SSR Markers
For the designing of the primers in the current study, a total of 240 SSRs, which were distributed on different chromosomes, were selected. Then, the reliability was evaluated on eight pepper cultivars. It was determined that out of the total primer sets tested, 41 were successfully amplified showing full length polymorphisms. The remaining 199 primers were found to be either non-polymorphic, non-specific amplification with ambiguous bands, or not amplified, as evidenced from the gel results. Of the 41 amplified primer sets, only 33 (13.75%) were found to have generated both polymorphic and unambiguous bands on the gel. Therefore, those primer sets were selected for further analysis, as detailed in Table 1. Among the 33 SSR loci, 1 was observed to be dinucleotides, 11 were trinucleotides, 4 were tetranucleotides, 4 were pentanucleotides, and 13 were hexanucleotides, respectively.

Polymorphism Analysis with SSR
The results of polyacrylamide gel electrophoresis of several highly polymorphic SSR markers are shown in Figure 2C. In total, 91 alleles were obtained with the aforementioned 33 amplified SSR primers. Among those markers, the Number of Alleles (Na) per locus ranged from 2 (Chr1SSR8, Chr2SSR12, Chr2SSR14, Chr3SSR5, Chr3SSR6, Chr4SSR11, Chr5SSR13, Chr5SSR14, Chr6SSR12, Chr6SSR20, Chr7SSR2, Chr8SSR6, Chr9SSR16, Chr10SSR14, Chr11SSR14, and Chr11SSR16) to 6 (Chr4SSR20), with an observed average of 2.8 alleles. The Effective Number of Allele (Ne) per locus ranged from 1.0288 (Chr7SSR2) to 3.6226 (Chr7SSR15), with an observed average of 1.7055 alleles. The major allele frequency (MAF) ranged from a low of 0.3231 (Chr7SSR15) to a high reaching 0.9858 (Chr7SSR2), with an average of 0.7547. In addition, the Observed Heterozygosity (Ho) ranged from 0.000 (Chr1SSR8, Chr4SSR11, Chr6SSR17, Chr7SSR2, Chr10SSR12, Chr11SSR4, and Chr11SSR14) to 0.9863 (Chr5SSR16), with an average of 0.0989 observed. It was also determined that the Expected Heterozygosity (He) ranged from 0.0281 (Chr7SSR2) to 0.7264 (Chr7SSR15), with an average of 0.3313. Also, the Shannon information index (I) ranged between 0.0744 and 1.3273, with an average of 0.5758, and the polymorphic information content (PIC) ranged from 0.0276 to 0.6718, with a mean value of 0.2893. In the present study, the calculation of the mean gene diversity confirmed it to be 0.3300 for all of the 147 types of material, as shown in Table 2. It was observed that the different markers had displayed different polymorphism. For example, primer Chr7SSR15 was found to be the most informative (PIC value: 0.6718), whereas primer Chr7SSR2 was the least informative (PIC value: 0.0276). Therefore, this study concluded that when considered altogether, the performances of the selected SSR markers were very effective in detecting genetic diversity.

Genetic Diversity Analysis
The genetic diversity and phylogenetic relationships were determined using 147 collected pepper cultivars from 26 different countries around the world (Figure 2 and Supplementary File S1). This study adopted Nei genetic distance and neighbor-joining methods, and a dendrogram was constructed based on the genotypes detected by the newly developed SSR markers (Figure 3). These were clustered into seven main groups (designated in this study as Groups I, II, III, IV, V, VI, and VII), which were comprised of 18, 37, 32, 20, 21, 5, and 14 members, respectively.
The dendrogram not only reflected the phylogenetic relationships of the cultivars, but was also consistent with their places of origin. Remarkably, the dendrogram revealed that Group I had become a unique branch. Furthermore, Group II consisted of 37 accessions, majority of which were from Latin America, with the exception of 3 accessions (PI 281,419 and PI 281,420 from the Philippines, and PI 281,347 from India). Of plant material collected from 15 countries, 32 types comprised Group III. All of the materials collected in Africa were found to be clustered in that group. In addition, 4 Chinese cultivars were also assigned to Group III. Group IV consisted of 20 accessions, among which 7 were from Guatemala; 7 were from South America; 3 were from United States; 2 were from Mexico; and 1 was from an unknown country, which this study speculated may have been of North American origin. Group V was composed of 21 accessions which were from various geographical origins, such as North America, South America, and other Asian areas. The five members which were clustered in Group VI included four cultivars derived from Costa Rica, and one derived from El Salvador. In addition, 20 accessions derived from Latin America were clustered together in Group VII (Figure 3). These results suggested that the newly developed SSR markers were both stable and suitable for assessing the genetic relationships among C. frutescens cultivars.

Genetic Diversity Analysis
The genetic diversity and phylogenetic relationships were determined using 147 collected pepper cultivars from 26 different countries around the world (Figure 2 and Supplementary File S1). This study adopted Nei genetic distance and neighbor-joining methods, and a dendrogram was constructed based on the genotypes detected by the newly developed SSR markers (Figure 3). These were clustered into seven main groups (designated in this study as Groups I, II, III, IV, V, VI, and VII), which were comprised of 18, 37, 32, 20, 21, 5, and 14 members, respectively. Figure 3. Dendrogram was constructed based on the genotypes from 33 SSR markers using neighbor-joining methods, and the icons indicated the information of material source, which was shown in Figure 2B.
The dendrogram not only reflected the phylogenetic relationships of the cultivars, but was also consistent with their places of origin. Remarkably, the dendrogram revealed

Discussion
The genome wide analysis of SSRs could provide the opportunity to decipher the optimal functions of these repeats in the regulation and organization of a genome. Also, the potential uses of these markers, such as diversity and population analyses, evolutionary history, and genome and comparative mapping, are currently being explored [19][20][21][40][41][42].
In previous studies, based on the results of sequencing analyses, assessments of the highquality genome sequences of C. frutescens were made possible, thereby accrediting the opportunity to develop suitable SSR primers. In the present study, it was observed that the dinucleotide and trinucleotide motif repeats were the most abundant, accounting for 85.74%. The remainder of the repeats (14.26%) were contributed by the tetranucleotides, pentanucleotides, and hexanucleotides, as illustrated in Figure 1E. This phenomenon has also been reported in other plants, such as Radix codonopsis, Anthuriumand raeanum, and Camellia sinensis, respectively [43][44][45]. Nevertheless, a few reports have found high abundance of tetranucleotide repeats in some plants, such as Cucumis sativus, Medicago truncatula, and Vitis vinifera [46][47][48]. The differences in the previous finding may have been due to the dissimilarities in paradigms adopted for the SSR identifications. Moreover, the examination of the SSR motif frequency manifested that the scatterings of the dinucleotides, trinucleotides, tetranucleotides, pentanucleotides, and hexanucleotides repeats were generally skewed toward fewer numbers of repeats. These findings indicated that there were predominantly fewer repeats along the pepper genome. The results obtained in this study indicated that higher repeats were found in dinucleotide and trinucleotide SSRs. However, repeats were fewer in number or absent among the tetranucleotide, pentanucleotide, and hexanucleotide SSRs ( Figure 1A). Similarly, in other plants, such as citrus, watermelon, and tea, the same trends were observed [18,19,45]. This may be due to the obvious differences in the frequencies and types of SSR motifs. In the pepper genome, this study found AT/TA was the most common, while the CG/GC motif was very rare in the dinucleotide repeats ( Figure 1B). These findings were consistent with the motif frequencies found among cucumber, strawberry, maize, Radix codonopsis, potato, plum, watermelon, and horseradish [19,43,[48][49][50][51][52][53]. However, our results greatly differed from the motif frequencies observed in rice, citrus, onion, and Atremisia frigida [18,[54][55][56], where AG/CT has been found to be the most dominant type. Similarly, TTG, AAT, and ATT were the prevailing motifs of the trinucleotide in this study, with CCG and AGG being the predominant motifs in the monocotyledons, such as barley, rice, and corn. It was found that the number of SSRs, along with their structure and repeat motifs, will greatly differ compared with those in plant species.
It has been found that SSR markers are very much beneficial in population genetics and molecular breeding. However, their effectiveness mainly relies on the marker quality and the accuracy of the experimentation [25]. In the present study, 240 selected SSR loci markers were scrutinized, resulting in 33 unique markers. It was found that when evaluating 147 pepper cultivars, these markers demonstrated remarkable and unambiguous amplification bands ( Table 1). The screened SSR polymorphism primers accounted for 13.8% of the total. Previously, Li et al. obtained 17 pairs of SSR polymorphism primers with clear bands and high polymorphism from 152 pairs of SSR primers covering 12 chromosomes [33]. These accounted for 11.2% of the total number of SSR primers, which was slightly lower than that obtained in this result. Liu et al. evaluated 85 pairs of SSR polymorphic primers and 12 pairs were scrutinized [57]. These accounted for 14.1% of the total, which was similar to this study's research results. Wu et al. used three different peppers as templates to select 65 pairs of SSR polymorphism primers from 153 pairs of SSR primers [42]. These accounted for 42.5% of the total number, which was substantially higher than that obtained in this study. Therefore, it was determined that the proportion of SSR polymorphism primers screened in this study was relatively low, which may have been attributed to the small differences existing in these peppers However, although the proportion of polymorphic primers was low, the results could still be used to analyze the genetic diversity of the pepper population. In regard to pepper, many SSR markers have been developed and mapped to linkage groups [58][59][60], which provide a key basis for analyzing pepper genetic diversity. However, such factors as the number, size, and types of SSR markers, frequencies of the SSR motifs, as well as the various sampling schemes, apparently result in differences in genetic diversity [25,61].
It is of major significance for the collection and efficient utilization of germplasm resources to continue to carry out genetic diversity evaluations. Highly polymorphic, as well as stable markers, are the prerequisites for studying genetic relationships and diversity. Nevertheless, it was found that the SSR loci showed less diversity, as evidenced by low polymorphism information index and gene diversity when compared to earlier reports. For example, in the study conducted by Nicolaï [62], a PIC of 0.67 and a gene diversity of 0.7 were reported, which were 0.38% and 0.37% higher than those of the current study (PIC: 0.29; Gene Diversity: 0.33). However, this may have been due to the volume and types of test sampling. In the aforementioned study, 1352 accessions from 89 countries were utilized, including 11 species of Capsicum. However, this study only examined a single species, C. frutescens. Therefore, compared with Li's report [33], the amount of accessions used was approximate. The PIC was approximate (slightly higher than this study), but the number of markers was different. Therefore, it was considered that the hypothesis that the difference in genetic diversity was influenced by the number of SSR markers had been confirmed.
Previously, researchers reported that genetic diversity of some peppers (Capsicum spp.) accessions, including Capsicum chinense, Capsicum annuum. In 2016, 71 C. chinense accessions from different Brazilian geographic regions, using fruit morphological descriptors and AFLP molecular markers, were analyzed [63]. The results found no association between the morphological descriptors and AFLP markers [63]. In the same year, the researchers investigated patterns of molecular diversity using a transcriptome-based 48 single nucleotide polymorphisms (SNPs) in a large germplasm collection comprising 3821 accessions. Among the 11 species examined, Capsicum annuum showed the highest genetic diversity (HE = 0.44, I = 0.69), whereas the wild species C. galapagoense showed the lowest genetic diversity (HE = 0.06, I = 0.07). The Capsicum germplasm collection was divided into 10 clusters (cluster 1 to 10) based on population structure analysis, and five groups (group A to E) based on phylogenetic analysis [64]. The dendrogram constructed in this study from 147 pepper accessions using NJ methods, indicated that the genetic relatedness of the pepper cultivars clustered in the majority of the groups were in good agreement with their geographic origins. This study's analysis results were also consistent with the previous findings reported by Luo et al. and Jia et al. [7,65]. Moreover, the geographical sources of the Group III and Group V materials were found to be diverse, and not only attributed to Asian and African countries, but also to Latin American countries. It was observed that the pepper plants of the same geographical origin were not strictly divided into the same groups. For example, the eight pepper materials from China were grouped into Group I and Group III. These findings suggested that numerous complex migrations had occurred in the pepper genotype as the result of human migration, which had led to their adoption, acclimatization, and local selection.

Conclusions
In the present study, 278,425 SSRs were identified by searching the C. frutescens genome sequences. The AT/TA, TTG/CAA, and AAAT/ATTT were observed to be the most common repeat motifs in the dinucleotide, trinucleotide, and tetranucleotide repeats, respectively. Among them, dinucleotides were the most abundant number of repeats, accounting for 55.16%. In this research investigation, the genetic diversity of C. frutescens germplasms was investigated using 33 SSR markers, which were evenly distributed on all of the chromosomes. The 147 experimental materials used in this study were wild peppers with rich genetic diversity. Their eminent properties (withstanding pathogenic attacks, high yielding capacity, and so on) will potentially provide excellent genes for the acquisition of pepper breeding accessions. They may also provide a basis for improving the existing cultivated pepper species, as well as having important significance in expanding the narrow genetic basis of pepper breeding in China. In addition, it was considered that the genome-wide identification and development of SSR markers could be very useful and may possibly provide insights into various research areas regarding C. frutescens in the future, such as high-density genetic mapping, genome comparative mapping, and genome-wide association analyses.