Assessment of the Genetic Diversity of the Breeding Lines and a Genome Wide Association Study of Three Horticultural Traits Using Worldwide Cucumber (Cucumis spp.) Germplasm Collection

Cucumbers are an important economic vegetable crop that is used for fresh or processing purposes worldwide. In this study, we used 264 accessions that consisted of world-wide wild germplasms and advanced breeding lines in order to understand the genetic diversity and the genetic correlation among the germplasm collection. A genotyping-by-sequencing (GBS) approach was applied to obtain dense genome-wide markers coverage (>12,082 SNPs) to construct a high-density haplotype map. Various population stratification methods were performed, and three subgroups were divided based on the genetic diversity, which reflected their geographic regions. According to the phylogenetic analysis, the breeding lines were separated from wild germplasms, and the two distinct groups were divided within the breeding lines. One of the groups mainly consisted of East-Asian varieties, which showed the unique homogenous genotype patterns. Using this germplasm collection, three important horticultural traits of cucumbers—powdery mildew resistance, spine color, and fruit stalk-end color—were evaluated and used to conduct the genome-wide association study (GWAS). All of the significant SNPs and two novel candidate genes (Csa5G453160 and Csa5G471070) for the powdery mildew were identified in chromosome 5 from the natural population, which is where reported major QTLs from various bi-parental population are located. Furthermore, two candidate genes, Csa1G006300 and Csa3G824850, and four candidate genes, Csa2G368270, Csa3G236570, Csa5G175680, and Csa6G448170, were identified for the spine color and the fruit stalk-end color, respectively. These results are expected to be helpful to develop molecular markers of the horticultural traits in cucumbers.


Introduction
Cucumbers (Cucummis spp.) are an economically important vegetable crop worldwide that are used for salads, pickles, and fresh consumption [1]. They are rich in dietary fiber; vitamins, which include vitamins B, C, and K; antioxidants; and minerals, such as potassium and magnesium [2]. The origin of the cucumber is known to be India, which is where the wild ancestor C. sativus var. hardwickii round and bitter fruit grows [1,3,4]. During the last 3000 years, various varieties adapted to the local environmental conditions and the food culture have been differentiated [4,5]. Modern cucumber Furthermore, high quality genome-wide SNPs via genotyping-by-sequencing (GBS) were applied to identify the significant genetic regions correlated with the three horticultural traits-the PM, the fruit's stalk-end color, and the spine color-of cucumbers through genome-wide association studies (GWAS).

Plant Materials
A cucumber germplasm collection that consisted of 264 accessions, which included the local wild varieties and the breeding lines, was assessed in this study (Table S1). This population consisted of four species that were collected from 37 countries. Among the Cucumis species, most of the accessions belong to C. sativus (257 accessions), and the other three species C. anguria, C. myriocarpus, and C. zeyheri accounted for the 3, 3, and 1 accession, respectively. Among the accessions, 100 accessions are the advanced breeding lines that were developed for commercial breeding, and the other 164 accessions are the wild germplasms that are systematically sampled from the USDA National Plant Germplasm System (NPGS) based on the diversity.

Phenotypic Evaluation and Correlation Analysis
Over a two-year period (2018-2019), the cucumber germplasm was planted in two seasons, spring and autumn, in plastic houses at Anseong-si, Korea. Two replications with three plants of each accession were evaluated throughout the growing period. For the GWAS, three important horticultural traits, the powdery mildew resistance, the fruit's stalk-end color, and the spine color were evaluated in four different environments.
The resistance to powdery mildew at the seedling stage was evaluated in addition to the field resistance. The seedlings with fully expanded cotyledons were inoculated three times during three-day intervals, and the conidial suspensions collected from the diseased plants. The density of the spore suspension was adjusted to approximately 1 × 10 5 sporangia per ml prior to the inoculation. After the inoculation, the seedlings were kept in humid plastic tunnels for a day. The resistance index was scored at 2~3 weeks after the inoculation based on the resistance scale of 1-9, where 1: wilting and death of the whole plant (completely susceptible); 3: infection of the whole leaf (susceptible); 5: infection less than 30% of the leaf area (intermediate resistant); 7: several infected spots on the leaf surface (resistant); and 9: no visible symptoms (strong resistant). The fruit's stalk-end color (shoulder color) was scored as 1 to 3 (1 = short green stalk-end, 2 = long green stalk-end, and 3 = whole green body) ( Figure S1). The fruit's spine color was scored from 1 to 3 (1 = white, 2 = brown, and 3 = black).

Genomic DNA Extraction and GBS
Genomic DNA was extracted from the samples using the CTAB method [33] and diluted to 80 ng/µL in distilled water. The GBS libraries were constructed via double digest restriction enzymes (EcoRI/MseI), and a compatible set of 96 barcodes was used as previously described [34]. The digested DNA was ligated to adapters and amplified with TA primers. The libraries were pooled in three tubes. The contents of the tubes were sequenced in separate lanes using the HiSeq 2000 platform (Illumina, San Diego, CA, USA) at Macrogen (Seoul, Korea).

Reference-Based SNP Calling and Construction of the SNP Set
Raw 101-bp reads from the libraries were trimmed to a minimum length of 80 bp and filtered to a quality score > Q30. The filtered reads were aligned to the Cucumis sativus L. var. sativus cv. 9930 (Chinese Long) as a reference genome sequence [11] using the Burrows-Wheeler Aligner program v.0.7.12 [35]. For the SNP calling and filtering, the GATK Unified Genotyper v.3.3 was used [36]. Raw SNPs were filtered to remove the mono and tri-allelic SNP types. The final SNP set was constructed after filtering, which followed the conditions that are listed next. Minor allele frequency > 0.05, SNP coverage > 0.6, and inbreeding coefficient (IF) > 0.8.

Population Structure Analysis
To analyze the population structure, we used a model-based genetic clustering algorithm [37] that was implemented in the STRUCTURE program version 2.3.4 [38]. The number of subpopulations (∆K) was determined using the ad hoc statistical method, which is based on the rate change in the log probability of the data between the successive K values [39]. Ten independent runs for the K values that ranged from 1 to 10 were performed with a burn-in length of 50,000 followed by 1,000,000 iterations.

Phylogenetic and the Principal Coordinate Analyses
The genetic relationships among the inferred subpopulations were investigated based on phylogenetic and principal coordinate analyses. The phylogenetic trees were produced using the weighted neighbor-joining clustering method based on the dissimilarity matrix with software DARwin 6.0.9 [40]. The principal component analysis (PCA) was performed using the "pca Methods" library in R software [41]. The values of each PC were used as variables in the GWAS.

Genome-wide Association Study (GWAS) and Candidate Gene Identification
The GWAS based on the efficient mixed model association (EMMA) was conducted using the R package of Genomic Association and Prediction Integrated Tool (GAPIT) [42] with the default settings. The significance threshold −log 10 p-value of the GWAS was determined using the Bonferroni correction (FDR p-value < 0.05) [43] based on the number of independent SNPs in a population. Version 2 of the Chinese Long gff3 file (cucurbitgenomics.org/pub/cucurbit/genome/cucumber/Chinese_long/v2) was used to identify the nearby genes of each significant SNP from the GWAS results. The candidate genes were selected by comparing them with the previously reported genes or by assessing the 1 Mbp regions around the significant SNP.

Genotypes and Genetic Variation of the Cucumber Germplasm Collection
The GBS data were aligned to the genome sequence of Cucumis sativus L. var. sativus cv. 9930 [11]. The GBS genotyping of 264 accessions generated 947,205 SNPs after trimming a quality score of <30 (Table S2). The filtering of those SNPs with criteria of excluding the mono or the tri-allelic SNPs types, which included MAF > 0.05, SNP coverage > 0.6, and IF > 0.8, resulted in a set of 12,082 high-quality SNPs, which were evenly distributed on 7 chromosomes ( Figure 1 and Table S2). Each SNP marker was named according to its physical position in the cucumber reference genome.

Population Structure Analysis
To analyze the population structure, we used a model-based genetic clustering algorithm [37] that was implemented in the STRUCTURE program version 2.3.4 [38]. The number of subpopulations (ΔK) was determined using the ad hoc statistical method, which is based on the rate change in the log probability of the data between the successive K values [39]. Ten independent runs for the K values that ranged from 1 to 10 were performed with a burn-in length of 50,000 followed by 1,000,000 iterations.

Phylogenetic and the Principal Coordinate Analyses
The genetic relationships among the inferred subpopulations were investigated based on phylogenetic and principal coordinate analyses. The phylogenetic trees were produced using the weighted neighbor-joining clustering method based on the dissimilarity matrix with software DARwin 6.0.9 [40]. The principal component analysis (PCA) was performed using the "pca Methods" library in R software [41]. The values of each PC were used as variables in the GWAS.

Genome-wide Association Study (GWAS) and Candidate Gene Identification
The GWAS based on the efficient mixed model association (EMMA) was conducted using the R package of Genomic Association and Prediction Integrated Tool (GAPIT) [42] with the default settings. The significance threshold −log10 p-value of the GWAS was determined using the Bonferroni correction (FDR p-value < 0.05) [43] based on the number of independent SNPs in a population. Version 2 of the Chinese Long gff3 file (cucurbitgenomics.org/pub/cucurbit/genome/cucumber/Chinese_long/v2) was used to identify the nearby genes of each significant SNP from the GWAS results. The candidate genes were selected by comparing them with the previously reported genes or by assessing the 1 Mbp regions around the significant SNP.

Genotypes and Genetic Variation of the Cucumber Germplasm Collection
The GBS data were aligned to the genome sequence of Cucumis sativus L. var. sativus cv. 9930 [11]. The GBS genotyping of 264 accessions generated 947,205 SNPs after trimming a quality score of <30 (Table S2). The filtering of those SNPs with criteria of excluding the mono or the tri-allelic SNPs types, which included MAF > 0.05, SNP coverage > 0.6, and IF > 0.8, resulted in a set of 12,082 highquality SNPs, which were evenly distributed on 7 chromosomes ( Figure 1 and Table S2). Each SNP marker was named according to its physical position in the cucumber reference genome.

Population Structure and Genetic Diversity of the Cucumber Germplasm Collection
To understand the structure of the cucumber germplasm collection, a structure analysis of ten independent runs was performed with K values from 1 to 10. Based on the genetic clustering algorithm model, the ∆K method determined that the best K is 3 for the SNP dataset. This result indicated that three subpopulations represent the best number of clusters for the cucumber germplasm collection used in this study. Cluster 1, cluster 2, and cluster 3 included 67, 61, and 136 accessions, respectively ( Figure 2a and Table S3). It is interesting to note that at K = 4, which is not the best option of the clustering, cluster 3 in K = 3 was divided into breeding lines and germplasm, which is similar to the results seen in the PCA and the phylogenetic analysis ( Figure 2). Evaluation of the geographical distribution showed that most of the accessions in cluster 1 were from Europe, Central Asia, and West Asia, whereas the accessions that belonged to cluster 2 were from South/East Asia and India, which is considered to be the center of the origin of cultivated cucumbers. Cluster 3 included accessions mostly from East Asia, which mostly included accessions from Korea, Japan, and China. Accessions from the North America were distributed in all three clusters ( Figure 3). According to the Q value ratio, cluster 3 showed a homogeneous genetic background, whereas cluster 1 and 2 showed a slight admixture ( Figure 2a and Table S3).
that the genetic background of the East Asian group has a large difference compared to the other groups of cucumber germplasms.

Genetic Relatedness among the Population
The clustering pattern at optimal K from the structure analysis was consistent with the phylogenetic tree and the PCA results. The phylogenetic analysis using the weighted neighborjoining clustering method separated the germplasm collection into four clades ( Figure 4a). From the top, clades 1 to 3 were comprised mostly of the accessions in the cluster 3 in structure analysis, and clade 4 was comprised mostly of the accessions in cluster 1 and cluster 2 ( Figure 4a).
Most of the breeding lines used in this study were clustered in clade 2. In this group, the diverse East-Asian breeding lines were strongly clustered. The Korean semi-white lines with a black spine (WB) and a white spine (WW) were closely related to the Korean solid green (HB) and the Chinese long green (CHS) (Figure 4b). Notably, some accessions from the WW were isolated in clade 3, which were placed in distinctive branches ( Figure 4a). Meanwhile, the other breeding lines, such as the Beith Alpha (BAF and BAP), the American pickling (API and APD), the Parthenocarpic slicer (PSL), and the Taiwanese slicer (TAS), were divided into one separate group from the East-Asian breeding lines (Figure 4b and Table S4). Interestingly, some variety groups which included the CHS, the Thailand slicer (THS), and the Long Dutch green (EUR) showed stronger relations to the wild germplasms rather than the other breeding lines (Figure 4a). When performing the phylogenetic analysis with only the breeding lines, the above accessions were clustered in a different group from the others (Figure 4b).
A similar pattern of phylogenetic relationships was observed in the principal component analysis (PCA). When plotting the whole accessions in distance metrics, the first and the second axes explained 51.21% and 5.81% of the genotypic variance, respectively. Based on the PC1, the breeding lines were strongly grouped and separated from the wild germplasm accessions (Figure 2b). The wild germplasm accessions were divided into three clusters according to the geographical distribution that was observed in the structure analysis ( Figure 2a). Moreover, the breeding lines were closely grouped to each of the variety groups as shown in the phylogeny tree ( Figure 2d and Table S4). The genetic difference among the three clusters was estimated by the population divergence (F ST ). The calculated F ST value of 0.48 indicated the largest difference between the Eurasian (cluster 1) and the East Asian group (cluster 3). The second largest difference was identified between the Indian (cluster 2) and the East Asian group (cluster 3 and F ST = 0.35). The smallest difference was found between the Eurasian (cluster 1) and the Indian group (cluster 2 and F ST = 0.26). These results indicate that the genetic background of the East Asian group has a large difference compared to the other groups of cucumber germplasms.

Genetic Relatedness among the Population
The clustering pattern at optimal K from the structure analysis was consistent with the phylogenetic tree and the PCA results. The phylogenetic analysis using the weighted neighbor-joining clustering method separated the germplasm collection into four clades (Figure 4a). From the top, clades 1 to 3 were comprised mostly of the accessions in the cluster 3 in structure analysis, and clade 4 was comprised mostly of the accessions in cluster 1 and cluster 2 ( Figure 4a).
Most of the breeding lines used in this study were clustered in clade 2. In this group, the diverse East-Asian breeding lines were strongly clustered. The Korean semi-white lines with a black spine (WB) and a white spine (WW) were closely related to the Korean solid green (HB) and the Chinese long green (CHS) (Figure 4b). Notably, some accessions from the WW were isolated in clade 3, which were placed in distinctive branches (Figure 4a). Meanwhile, the other breeding lines, such as the Beith Alpha (BAF and BAP), the American pickling (API and APD), the Parthenocarpic slicer (PSL), and the Taiwanese slicer (TAS), were divided into one separate group from the East-Asian breeding lines (Figure 4b and Table S4). Interestingly, some variety groups which included the CHS, the Thailand slicer (THS), and the Long Dutch green (EUR) showed stronger relations to the wild germplasms rather than the other breeding lines (Figure 4a). When performing the phylogenetic analysis with only the breeding lines, the above accessions were clustered in a different group from the others (Figure 4b).
A similar pattern of phylogenetic relationships was observed in the principal component analysis (PCA). When plotting the whole accessions in distance metrics, the first and the second axes explained 51.21% and 5.81% of the genotypic variance, respectively. Based on the PC1, the breeding lines were strongly grouped and separated from the wild germplasm accessions (Figure 2b). The wild germplasm Agronomy 2020, 10, 1736 7 of 15 accessions were divided into three clusters according to the geographical distribution that was observed in the structure analysis (Figure 2a). Moreover, the breeding lines were closely grouped to each of the variety groups as shown in the phylogeny tree (Figure 2d and Table S4).  Two horticultural traits, the fruit's stalk-end color and the spine color, were evaluated in 2018 and 2019. The same results were identified from both evaluations, and among the 264 cucumber accessions, except for 68 accessions of the poor fruit, 40.8% of the entire germplasm showed whole green skin color, whereas 30.2% showed short green stalk-end with a light green skin color. Additionally, only 3% showed the long green stalk-end (Figure 5b and Figure S1). As for the spine color, except for 43 accessions where fruit was in bad condition, almost half of the accessions (47.5%) showed a white spine followed by a black spine (31.3%) and a brown spine (4.5%) (Figure 5c).

GWAS and Effective SNPs Influencing Horticultural Traits of Cucumber
The GWAS was performed for three horticultural traits-the PM resistance, the fruit's stalk-end color, and the spine color-and significant SNPs for each trait were identified. Five significant SNPs located at the 16 to 17 Mbp region on chromosome 5 were related to the PM resistance (Figure 6a). Among those SNPs, three SNPs, S5_16047445, S5_16080922, and S5_16623037, were commonly detected both in 2018 and 2019, and S5_16623037 showed the highest significance. In 2019, two additional significant SNPs, S5_17127151 and S5_17127216, were identified ( Figure 6 and Table S5). Two horticultural traits, the fruit's stalk-end color and the spine color, were evaluated in 2018 and 2019. The same results were identified from both evaluations, and among the 264 cucumber accessions, except for 68 accessions of the poor fruit, 40.8% of the entire germplasm showed whole green skin color, whereas 30.2% showed short green stalk-end with a light green skin color. Additionally, only 3% showed the long green stalk-end (Figure 5b and Figure S1). As for the spine color, except for Agronomy 2020, 10, 1736 9 of 15 43 accessions where fruit was in bad condition, almost half of the accessions (47.5%) showed a white spine followed by a black spine (31.3%) and a brown spine (4.5%) (Figure 5c).

GWAS and Effective SNPs Influencing Horticultural Traits of Cucumber
The GWAS was performed for three horticultural traits-the PM resistance, the fruit's stalk-end color, and the spine color-and significant SNPs for each trait were identified. Five significant SNPs located at the 16 to 17 Mbp region on chromosome 5 were related to the PM resistance (Figure 6a). Among those SNPs, three SNPs, S5_16047445, S5_16080922, and S5_16623037, were commonly detected both in 2018 and 2019, and S5_16623037 showed the highest significance. In 2019, two additional significant SNPs, S5_17127151 and S5_17127216, were identified ( Figure 6 and Table S5). The fruit's stalk-end color showed an association with five SNPs. The most significant SNP was identified at 32.5 Mbp region on chromosome 3 (−log10 p-value = 7.59), which was followed by two SNPs at 21.2 Mbp on chromosome 4, one SNP at 1.2Mbp on chromosome 1, and one SNP located at 14.1 Mbp on chromosome 3 ( Figure S2 and Table S5).
The spine color was correlated with six SNPs. The most significant SNP was S5_7311032 (−log10 p-value = 7.87), which was followed by SNP S3_15236835, two SNPs within 17.7 Mbp, and the 21.1 Mbp region on chromosomes 2 and 6, respectively ( Figure S2 and Table S5).
A total of 182, 558, and 509 genes were identified within the 1 Mbp regions around the significant The fruit's stalk-end color showed an association with five SNPs. The most significant SNP was identified at 32.5 Mbp region on chromosome 3 (−log 10 p-value = 7.59), which was followed by two SNPs at 21.2 Mbp on chromosome 4, one SNP at 1.2Mbp on chromosome 1, and one SNP located at 14.1 Mbp on chromosome 3 ( Figure S2 and Table S5).
The spine color was correlated with six SNPs. The most significant SNP was S5_7311032 (−log 10 p-value = 7.87), which was followed by SNP S3_15236835, two SNPs within 17.7 Mbp, and the 21.1 Mbp region on chromosomes 2 and 6, respectively ( Figure S2 and Table S5).
A total of 182, 558, and 509 genes were identified within the 1 Mbp regions around the significant SNPs for powdery mildew, the stalk end color, and the spine color, respectively. Among them, Csa5G471070 and Csa5G453160 were nominated as the candidate genes for the PM resistance as significant SNPs were located within the genes (Figure 6b). The candidate gene, which was the Csa5G453160 linked to SNP s5_16047445, contained a significant SNP located inside the 2nd exon. The varieties carrying the C allele showed stronger resistance to the PM than those carrying the T allele in the two-year assessment (Figure 6c). The germplasm carrying the T allele of the SNP s5_16623037 located in the intron between the 6th and 7th exons of the Csa5G471070 showed stronger resistance to the PM than those carrying the A allele for two years (Figure 6d).
Significant SNPs were found in two candidate genes, the Csa1G006300 and the Csa3G824850, for fruit's stalk end color. The SNP S1_1181919 was located in the 5th intron of the Csa1G006300, and the S3_32581537 was located inside the 4th CDS region of the Csa3G824850 ( Figure S2).
For the spine color, all six detected SNPs were located in the genic regions. Two significant SNPs, the S2_17762942 and the S2_17762964, were located at the 1st exon of the Csa2G368270, and two SNPs, the S3_15236835 and the S5_7311032, were located in the intron of the Csa5G453160 and the Csa5G175680 on chromosome 2. Furthermore, two SNPs, the S6_21138663 and the S6_21138684, on chromosome 6 were linked to the Csa6G448170 in the 2nd exon ( Figure S2).

Discussion
In this study, we evaluated 264 Cucumis accessions that consisted of wild germplasms and various breeding lines of typical variety groups in order to understand the genetic diversity and the genetic relatedness especially between the breeding lines and the wild germplasms. A total of 100 advanced breeding lines of each typical variety group were assessed for genetic diversity and relatedness. For a comparison, 164 cucumber germplasm accessions were systematically sampled from the USDA NPGS based on the diversity. Among the germplasm accessions, 135 accessions are the same materials that were used for a previous study on genetic diversity, in which 76 out of 135 accessions were selected as a core collection [9]. This indicates that the germplasm accessions used in this study could be considered as a representative collection of the USDA cucumber collection.
Using these materials, the STRUCTURE analysis of the 264 Cucumis accessions revealed three distinct groups-group 1 (Indian and South Asia), group 2 (East Asia), and group 3 (others including Central/West Asia and Europe), according to the geographic origin and the domestication process, which are in accordance with the previous studies [9]. Based on the PCA and the phylogenetic analysis, we could confirm that the advanced breeding lines were distinguished from the other wild germplasm even though they consisted of various variety groups. The 15 variety groups used in this study have wide phenotypic variations in fruit morphology according to the typical consumption area. Therefore, we expected that there would be large genetic differences among the breeding lines. However, it is noteworthy that almost all the breeding lines were grouped together, which confirmed their small and narrow genetic background.
Within the breeding lines, two distinct groups were largely divided, and one of them was mainly cultivated in East Asia. One consisted of the WW, WB, HB, and CHS line, and the other was the slicer and pickling cucumbers that are grown in the USA and some European countries. This result was considered as a reflection of a narrow genetic background due to the highly integrated breeding within a limited area. Additionally, it supported the idea of the breeding process that cucumber domestication has been progressed in two directions-towards East Asia and towards another area [30].
The three traits assessed in this study-the PM, the spine color, and the fruit's stalk end color-are considered important characteristics in most of the local markets from the aspect of cultivation, storage, and consumption. In this study, we found genetic regions that are highly correlated with each trait by assessing the large scale of the germplasm collection as a natural population.
First of all, the PM resistance of 264 Cucumis accessions showed normal distribution, which was similar for two years, even though stronger resistance was observed in 2018. The difference in the disease index between the two years might be due to the different environmental conditions for each year. Based on the assessment of the disease index, five strong resistant germplasms to the PM were identified, and they were located in the lower part of the phylogeny tree (Figure 4a). The closest group of breeding line was the THS, even though the PM resistance of these variety groups was relatively lower than the wild accessions. Interestingly, among the WW-type breeding lines, the BN-019 showed a relatively strong PM resistance, and it seemed that it could be used as a potential resistant resource for the breeding of elite lines.
Using the two-year evaluation data, we obtained consistent results and detected three common significant SNPs in the 16 Mbp region on chromosome 5. This location is syntenic to the previously reported major QTLs, such as pm 5.1 [21], pm 5.1 [16], pm 5.2 [44], pm 5.2 [19], pm 5.2 [20], and some genes related to the disease resistance ( Figure 6b) [15]. Two significant SNPs for the PM were located within the genes, the Csa5G471070 and the Csa5G453160, that encode the Anaphase-promoting complex subunit and short-chain dehydrogenase/reductase, respectively (Table 1). Furthermore, one cyclin-like gene (Csa5G488810) and eight R genes (Csa5G429990, Csa5G464830, Csa5G466350, Csa5G467390, Csa5G467900, Csa5G485190, Csa5G494390 and Csa5G505160) were identified around the 1 Mbp region of each significant SNP for the PM resistance. We should carefully look at the PM candidate genes from this study, since it was first reported in a cucumber's natural population that did not show a population-specific bias.
Two market traits, the SDc and the SPc, are particularly important in East Asian countries. According to the GWAS results, five and six significant SNPs were directly linked with two and four genes to the SDc and the SPc, respectively. For SDc, two candidate genes, Csa1G006300 and Csa3G824850, which encode Response regulator 6 and MYB transcription factor were identified, respectively (Table 1). This is the first report to identify that the SNPs and the candidate genes are related to the cucumber fruit's stalk end color. These two genes are not directly related to the fruit color. However, in the Arabidopsis, the type-A RR genes play important roles in signaling transduction in response to a wide range of biotic and abiotic stresses. The gene products function as cytokinin receptors and respond to stress negatively in the cytokinin signaling pathway [45]. The Csa3G824850 gene shows homology to the MYB16 in the Arabidopsis, and it has a 90% similarity to the MYB76-like isoform X2 in melons. The MYB16 controls the epidermal cell morphogenesis in the petals or regulates the cuticle biosynthesis and the wax accumulation in the reproductive organs in the Arabidopsis [46,47].
Notably, all six significant SNPs for the spine color were located at genic regions encoding Pentatricopeptide repeat-containing protein, Zinc-containing alcohol dehydrogenase quinone oxidoreductase, Pentatricopeptide repeat-containing protein, and Thaumatin-like protein by Csa2G368270, Csa3G236570, Csa5G175680, and Csa6G448170, respectively (Table 1). According to the previous studies, the cucumber spine color is reported to be regulated by one or two strong dominant B genes. However, in this study, we could not identify a single strong gene, whereas significant SNPs on the chromosomes 2, 3, 5, and 6 were found. The most recently reported HEUKCHEEM gene was located at the 0.62 Mbp on chromosome 4 [30], whereas no relevant SNP was found on chromosome 4 in this study. In general, the PPR genes are involved in many cellular functions and biological processes, which include fertility restoration to the cytoplasmic male sterility (CMS) in plants. However, two PPR genes, the Csa2G368270 and Csa5G175680, showed an association with spine color in this study, which showed homology to the OTP90 and the PCMP-H40. These genes play a role in multiple site RNA editing in chloroplasts, such as C to U editing during the post-transcriptional step or editing of the site 7 of ndhB (ndhB-7) and site 5 of ndhD (ndhD-5) transcripts in the Arabidopsis [48,49]. According to the GO analysis, it was estimated that the Csa3G236570 have molecular function on quinone reductase activity, and the Csa6G448170 are related to the defense response.
In this study, we compared two groups of cucumber germplasms, wild germplasm and the advanced breeding lines, and found mutual genetic proximity. Even though the major consumption area or horticultural phenotypes of the breeding lines were significantly different, it was confirmed that the genetic backgrounds of the breeding lines were very narrow and clearly distinguished from the wild germplasm according to our results. Indeed, the breeding lines were divided into two groups that explained the direction of the cucumber domestication. By performing the GWAS, we identified 16 significant SNPs to three important traits (PM, SDc, SPc) and selected eight candidate genes. The identified SNPs and the candidate genes for three important traits (PM, SDc, SPc) via the GWAS performing are expected to be useful to develop molecular markers for cucumber breeding.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4395/10/11/1736/s1, Figure S1: Phenotypic standard of fruit stalk-end color. Fruit stalk-end color was scored as 1 to 3 for GWAS: (a) short green shoulder, (b) long green shoulder, (c) whole green body; Figure S2: Association with fruit stalk-end color and spine color. The green horizontal line in the Manhattan plots indicate the significant threshold based on the Bonferroni correction. The position of candidate genes is indicated by blue dotted lines with the names: (a): genome-wide association to fruit stalk-end color, (b) genome-wide association to spine color; Table S1: List of cucumber germplasm collection used in this study (n = 264); Table S2: SNP distribution among seven cucumber chromosomes detected by GBS; Table S3: Proportions of Q values in three clusters estimated by STRUCTURE analysis in cucumber germplasm collection; Table S4: List of unit orders displayed in the phylogenic tree in this study; Table S5