The genus Cucurbita
(Cucurbitaceae) has five major domesticated species, which include C. argyrosperma, C. ficifolia, C. maxima, C. moschata
, and C. pepo,
which have been cultivated globally as an economic crop [1
]. Among them, C. moschata
is considered as the second most diverse species of the genus after C. pepo
, and is closely related to C. argyrosperma
in genomic status [2
]. C. moschata
exhibits remarkable morphological diversity in its fruit and seed types, and is cultivated and consumed in many parts of the world in forms of mature and immature fruit, seeds, young stems, tendrils, and flowers [6
]. C. moschata
is a versatile food source, and it has also been used as rootstock for vegetable grafting, particularly in Asia [6
In most of East Asian countries, grafted seedlings are used routinely for commercial cucumber production for the purpose of overcoming continuous cropping disorders, escaping Fusarium
wilt, overcoming the increase of temperature stress tolerance, and contributing to the production of high-quality fruit [8
]. As cucumber stocks, fig leaf gourds (C. ficifolia
), pumpkins (C. moschata
), and Shintoza (C. maxima
× C. moschata
) are mainly used [9
]. In Korea, fig leaf gourds are mainly used for cold season cultivation and pumpkins and Shintoza are generally used for warm season cultivation. Among them, pumpkins are used as bloomless rootstock, because they inhibit the formation of blooms on the fruit’s surface, which makes it possible to produce high quality cucumbers with a shiny fruit skin [10
]. Bloomless rootstocks are particularly preferred in China even in the cold season, because the blooms are easily visible on the fruit’s skin of the long-type Chinese cucumbers, which have a uniform dark green color. In addition to the low-temperature tolerance, the increased resistance to diseases, pests, and the salt tolerance due to climate change and continuous cropping have been recognized as major breeding issues. However, since the genetic background of the current rootstock cultivars is narrow, it is necessary to develop gene pools for the breeding of new value-added varieties. However, it is required to understand the genetic structure and the diversity of the C. moschata
germplasm for this.
There have been few studies about the genetic diversity of wild C. moschata
germplasm until now, but there are many studies on the domesticated Cucurbita
germplasm available, especially with C. pepo
]. Even in the cases of the analysis of wild germplasm, the accessions consisted of small sizes of germplasm or landraces only from specific regions or countries [6
]. One of the most recent studies is about the genetic variability among 91 accessions from different regions of Brazil that used agro-morphological characteristics, such as seed related traits and carotenoids contents. They clustered the germplasm into 16 groups according to the phenotypic variance and suggested the promising accessions by the per se approach for the effective application to breeding programs [15
]. Another study showed the genetic divergence according to the elevation gradient in Mexico. They assessed the genetic data from 11 nuclear microsatellite loci and the mitochondrial locus of landraces from 24 localities [6
]. As such, the studies to date have mainly focused on the analysis of genetic relationships based on specific genetic patterns or morphological variations for limited genetic resources with a narrow geographic origin.
In this study, we gathered a large number of C. moschata germplasm from various origins and assessed the diversity with high quality SNPs that were distributed evenly among the pumpkin genome, which had a purpose to understand the genetic diversity and the population structure. Furthermore, we selected the compact size of the representative entries as a core collection and validated the core collection using the morphological traits related to the grafting and the flowering. The C. moschata core collection constructed in this study is thought to be efficiently applied to generate new gene pools and the consequent practical rootstock breeding.
spp.) are one of the vegetables that have great socio-economic importance. Many of the Cucurbita
germplasms have been cultivated over generations from family-based farmers in many countries. The selection of these germplasms has been practiced over time by these populations, which is associated with the exchange of seeds between them, and the natural occurrence of the hybridization in the germplasm of this species has contributed to its increased variability [15
Among the genus Cucurbita
, we have focused on one of the major domesticated species, the C. moschata
, which are cultivated globally and used for rootstock in the cucumber cultivation. Compared to other major crops, the breeding research on rootstock is still in the beginning stage worldwide, and the exploitation of useful genetic resources is still insufficient, so securing genetic resources that have agricultural value should be a top priority with this crop. In order to do this, we gathered a large number of germplasms that have a diverse origin from 42 countries and extracted the high quality 2071 SNPs throughout the 20 C. moschata
chromosomes. It is a remarkable size of germplasm and genetic markers compared to the previous studies, in which only three morphological characters or 11 SSR markers that contained 122 alleles were used to assess the genetic diversity in Brazil or Mexico [6
Among the 610 C. moschata
germplasm used in this study, 15 accessions showed different morphology in the species level during the phenotypic evaluation, even though they are classified as C. moschata
in the passport data. It is also common in other studies using genetic resources from various institutions and can occur due to misidentification in species level or by interspecific hybrid [17
]. Therefore, to reduce the experimental errors, we confirmed also in the genetic differences of 15 accessions from both in the PCA and in phylogenetic analysis as a separate group. Finally, a total of 595 accessions, excluding 15 uncertain accessions, were used for further genetic analysis in this study.
According to the structure analysis in this study, three distinct clusters were identified within the worldwide C. moschata germplasm collection. The first cluster mainly consisted of East Asian accessions, which includes the biggest number of accessions. Additionally, it is divided into two sub-groups in the phylogenic tree, which group 1 showed a genetic homogenous that mostly consisted of Korean landrace, whereas group 2 included some Korean commercial cultivars that showed a genetic admixture. According to the values of the FST and the diversity index, this cluster showed the biggest difference to the others, and it had the smallest genetic diversity. These results could suggest that the germplasm in East Asia have a narrow genetic diversity, because they have been locally isolated and developed under independent breeding using limited genetic resources, even though it includes the largest number of accessions.
The second cluster consists of the African and the South Asian accessions, which apparently appear to be closely related to each other. And the last cluster included accessions mostly from the America continent where more than 63% of them were from Mexico. Even though it has not yet been clarified, Mexico is believed to be the origin of the C. moschata
due to its archaeological clues and the presence of a various landrace that represents the high genetic diversity [18
]. According to the previous studies, we expected that the last cluster would have the highest genetic diversity. However, based on the results of genetic analysis, it showed the biggest genetic diversity in the second cluster rather than the third cluster, that Mexico belongs to. Therefore, Africa and South Asian countries, especially India, were also thought to need to be carefully considered as the origin of C. moschata
Based on the genetic diversity of the entire germplasm, we selected 67 representative accessions to establish the core collection of the C. moschata for the final purpose. In order to construct the core collection in this study, the large number of 2071 high-quality SNPs were considered, and phenotypic validation were also performed to ensure that we obtained a good representative.
The traits assessed in this study are important quality factors for the grafting, the cultural management, and the seed production, which can be considered as important targets in the rootstock breeding. The seedling characteristics of the rootstock, such as the small cotyledon and the short hypocotyl are taken into the primary target trait in the rootstock breeding. Since the main purpose of this study is to select the core collection that can be applied to the rootstock breeding through creating new gene pools, the above phenotype data would be practically applicable to choose the appropriate parental germplasm.
Actually, it takes a lot of time and effort to obtain consistent phenotypic data from a large number of germplasms of C. moschata
because of their growth habits and the requirement of a large cultivation space. It could be explained that research that uses expressive phenotypic data obtained under the same environmental conditions is very rare [13
]. Nevertheless, the phenotype evaluation in more than 600 accessions has been conducted over three years in this study, which was not only used as indicators in the selection of core collection but also would be useful for various further association studies. According to the results, the phenotypic distribution of the core collection mostly covered the entire germplasm variation, and it was thought that the core collection could represent the entire collection even in the level of the morphological diversity.
The large amount of genotypic and phenotypic data from this study would help to understand the genetic structure and the diversity of the C. moschata, and it could be practically applied in molecular breeding through further association studies.
4. Materials and Methods
4.1. Plant Materials and Phenotypic Evaluation
The total of 610 C. moschata
accessions, which were originally from 42 countries, were collected from the National Agrobiodiversity Center Rural Development Administration (RDA) in Korea and the Agricultural Research Service (ARS) of United States Department of Agriculture (USDA). The scientific name and the origin of each accession were identified by referring to the passport data of the offering institute. However, some of them exhibited characteristics that differed from the general characteristics of the species. Additionally, in the genotype we identify genetic differences based on population structure, phylogenetic tree, and PCA. Therefore, it was excluded from the core collection construction. The number of SNPs and accessions used in each analysis were mentioned in the results section. The accession name, and the geographic locations are listed in Table S1
Over a three-year period (2018–2020), the entire germplasm collection was planted in plastic houses at Anseong-si, Republic of Korea, and three traits of seedling and two traits related with the seed production were evaluated. Phenotypes related to the seedlings, which included the cotyledon length, the cotyledon width, and the hypocotyl length, were investigated from the 10th to the 12th day after sowing using calipers. The internode length and the number of female flowers in ten nodes were counted on the 40th day after transplanting. All the traits were investigated with two replicates in two plants per an accession.
4.2. The DNA Extraction and the SNP Set Construction Using the GBS Method
The genomic DNA was extracted from the samples using the CTAB method [16
], and they were diluted to 80 ng/µL in distilled water. The GBS library was conducted using the EcoR
I; enzyme set with an EcoR
I; adaptor. The libraries were pooled in seven tubes. The contents of the tubes were sequenced in separate lanes using the HiSeq 2000 platform (Illumina, San Diego, CA, USA) at Macrogen, which is located in Seoul, Republic of Korea. All the 101 bp reads were trimmed using 96 different barcode sequences as previous described [21
]. The filtered reads were aligned to the C. moschata
) as a reference genome sequence [22
]. The reads mapping, the sorting, and the grouping were performed using BWA-MEM (0.7.12), SAMtools (1.1), and Picard (1.119). A GATK Unified Genotyper v.3.3-0 was used to call the SNPs [23
]. The raw SNPs were filtered to remove the mono and the tri-allelic SNP types. The final SNP set was constructed after the filtering, which followed the following conditions. The minor allele frequency > 0.05, the SNP coverage > 0.6, and the inbreeding coefficient (IF) > 0.8.
4.3. The Genetic Analyses
In order to analyze the population structure of the entire germplasm (610 accessions) used in this study, we used a model based genetic clustering algorithm that is implemented in the STRUCTURE program ver. 2.3.4 [24
]. The number of sub-populations (ΔK) was determined using the ad-hoc
statistical method [25
], which is based on the rate change in the log probability of the data between the successive K values. Ten independent runs for the K values that ranged from 1 to 15, were performed with a burn-in length of 50,000, which was followed by 50,000 iterations.
The phylogenetic trees were produced using genotyping data with 2071 SNP markers that used both the unweighted neighbor-joining method and the hierarchical clustering method based on the dissimilarity matrix calculated with the Manhattan index, which was implemented in the DARwin software (version 6.0.9). The entire germplasm was divided into subgroups based on the major nodes of the phylogenetic tree.
The F-statistics, which include the observed number of effective alleles (Ne), the heterozygosity (HO), the expected heterozygosity (HE), Shannon’s diversity index (I), the fixation index (F), and the genetic differentiation (FST), were used for the analysis and the comparison of the diversity. Indices Ne, HO, HE, I, F, and FST were calculated using GenAlEx 6.502. The analysis of the molecular variance (AMOVA) was conducted in order to detect the genetic variance within and among the population using GenAlEx ver 6.502.
4.4. The Core Collection Selection and the Evaluation
In order to establish a core collection, the representative entries were selected based on the genotypic diversity analysis method of Lee [17
]. The representative accessions were selected based on the advanced M strategy using a modified heuristic algorithm implemented in the PowerCore software version 1.0 [26
]. In order evaluate the efficiency of the core entries from the whole accessions, five diverse traits were used to compare them. The phenotype data is presented as the mean ± SE.