Identification of Novel Loci and Candidate Genes for Cucumber Downy Mildew Resistance Using GWAS

Downy mildew (DM) is one of the most serious diseases in cucumber. Multiple quantitative trait loci (QTLs) for DM resistance have been detected in a limited number of cucumber accessions. In this study we applied genome-wide association analysis (GWAS) to detected genetic loci for DM resistance in a core germplasm (CG) of cucumber lines that represent diverse origins and ecotypes. Phenotypic data on responses to DM infection were collected in four field trials across three years, 2014, 2015, and 2016. With the resequencing data of these CG lines, GWAS for DM resistance was performed and detected 18 loci that were distributed on all the seven cucumber chromosomes. Of these 18 loci, only six (dmG1.4, dmG4.1, dmG4.3, dmG5.2, dmG7.1, and dmG7.2) were detected in two experiments, and were considered as loci with a stable effect on DM resistance. Further, 16 out of the 18 loci colocalized with the QTLs that were reported in previous studies and two loci, dmG2.1 and dmG7.1, were novel ones identified only in this study. Based on the annotation of homologous genes in Arabidopsis and pairwise LD correlation analysis, several candidate genes were identified as potential causal genes underlying the stable and novel loci, including Csa1G575030 for dmG1.4, Csa2G060360 for dmG2.1, Csa4G064680 for dmG4.1, Csa5G606470 for dmG5.2, and Csa7G004020 for dmG7.1. This study shows that the CG germplasm is a very valuable resource carrying known and novel QTLs for DM resistance. The potential of using these CG lines for future allele-mining of candidate genes was discussed in the context of breeding cucumber with resistance to DM.


Introduction
Cucumber (Cucumis sativus L.) is an economically important vegetable crop produced all over the world. Downy mildew (DM), caused by the obligate biotrophic oomycete Pseudoperonospora cubensis ((Berkeley and Curtis) Rostovzev), is one of the most serious diseases threatening cucumber production [1,2]. The symptoms of cucumber DM mainly appear on the upper side of infected leaves starting with chlorotic (yellow) lesions that are restricted by major leaf veins. Symptom development is sensitive to environmental conditions and in favor of wet and humid conditions, since the pathogen needs moisture for germination and starting of a new infection. Within days following the infection under appropriate environmental conditions, chlorotic lesions become necrotic and sporulation (with a brownish gray downy appearance) may be seen on the lower side of infected leaves. With the disease growth, infected leaves will eventually collapse, leading to losses of cucumber production due to lack of photosynthetic leaves.  Berg J., (2020) [11] 2. Results

Genetic Diversity of DM Resistance in the CG Germplasm
Based on natural infection, the CG lines were scored for DM resistance in the spring of 2014 (dm_2014S), the autumn of 2014 (dm_2014A), the autumn of 2015 (dm_2015A), and the spring of 2016 (dm_2016S), respectively (Supplementary File 1, Figure 1). The mean DI of dm_2014A and dm_2015A was higher than that of dm_2014S and dm_2016S (Figure 2a), indicating that DM was more severe in the autumn. In total, four cucumber ecotypes were evaluated, including the East Asian type (n = 37 CG lines), the Eurasian type (n = 30), the Indian type (n = 22), and the Xishuangbanna type (n = 8) (Supplementary File 2). A similar mean DI was found for all market types except for the Xishuangbanna type that had a lowest mean DI in the spring of 2014 ( Figure 2b). However, it was difficult to conclude that the Xishuangbanna type was more resistant since only eight CG lines were included and DI data were missing for most of them in two experiments (Supplementary Files 1 and 2).    The heatmap depicting the phenotypic distribution of disease index (DI) of downy mildew (DM) resistance in four environments. The three clusters of the core germplasm (CG) lines are numbered with 1 to 3, and data of mean DI per CG line in the four experiments were used for this clustering analysis. The numbers next to the color key represent DI values. Blue means resistant and red susceptible. The intensity of the color indicates the level of resistance/susceptibility.  Based on the average DI per line of the four experiments, these CG lines could be grouped into three clusters-1: Susceptible, 2: Intermediate resistant, and 3: Resistant ( Figure 1). Each cluster contains different ecotypes (Supplementary file 1). However, disease scores of some lines differed greatly among experiments. For example, the CG18 and CG14 line in the resistant cluster were susceptible in the spring of 2014 (dm_2014S) (Figure 1). This could be due to the fact that the naturally occurring DM infection was possibly not evenly distributed. Thus, we combined all phenotypic data of the four experiments to differentiate resistant and susceptible lines. A CG line was considered highly susceptible (HS) if it had a DI above 50 in at least one of the four experiments, or highly resistant (HR) if it had a DI smaller than 35 in all the experiments (Figure 1). From the tested 97 CG lines, we could define 18 HR and 22 HS lines, which are presented by different ecotypes (Supplementary File 3). Further, a normal DI distribution was found for three of the four experiments except for the spring of 2016, which suggested that DM resistance in this CG population was controlled by multiple QTLs ( Figure S1). Significant correlations were found among DIs of three experiments, dm_2014S, dm_2014A, and dm_2015A ( Figure S1). Since there was no good correlation of dm_2016S with other years, DI data of 2014 and 2015 were used for further analysis.

Genome-Associated Analysis of DM Resistance
Using the DI data of the three experiments in 2014 and 2015, genome-wide association analysis (GWAS) was performed and 18 genetic loci were detected contributing to DM resistance. These 18 loci were distributed over the seven chromosomes ( Figure 3). Of the 18 loci, only six loci (dmG1.4, dmG4.1, dmG4.3, dmG5.2, dmG7.1m and dmG7.2) were repeatedly detected in two experiments over two years, which were considered loci with a stable effect on DM resistance. By comparing the SNP (single nucleotide polymorphism) positions of these 18 loci with QTLs reported in previous studies ( Supplementary File 4), it was evident that 16 of the 18 loci of this study colocalized with the known ones ( Figure 4). Only two loci, dmG2.1 and dmG7.1, had not been reported before and were therefore considered as novel loci for DM resistance. dmG4.1, dmG4.3, dmG5.2, dmG7.1m and dmG7.2) were repeatedly detected in two experiments over two years, which were considered loci with a stable effect on DM resistance. By comparing the SNP (single nucleotide polymorphism) positions of these 18 loci with QTLs reported in previous studies (Supplementary File 4), it was evident that 16 of the 18 loci of this study colocalized with the known ones ( Figure 4). Only two loci, dmG2.1 and dmG7.1, had not been reported before and were therefore considered as novel loci for DM resistance.

Candidate Gene Analysis for the Novel and Stable Loci
In order to identify potential candidate genes in novel and stable loci (detected in more than one experiment), the 50-kb region around the peak SNPs was used for further analysis with the cucurbit genome (http://cucurbitgenomics.org/). The choice of the 50-kb region was based on the fact that LD decay was less than 56 kb in most material in cucumber [13].

Candidate Gene Analysis for the Novel and Stable Loci
In order to identify potential candidate genes in novel and stable loci (detected in more than one experiment), the 50-kb region around the peak SNPs was used for further analysis with the cucurbit genome (http://cucurbitgenomics.org/). The choice of the 50-kb region was based on the fact that LD decay was less than 56 kb in most material in cucumber [13]. In the 50-kb region of the novel locus dmG2.1, six genes were present (Figure 5a,b): Csa2G059820 (encoding a pentatricopeptide repeat-containing protein), Csa2G059830 (a homogentisate phytyltransferase), Csa2G059840 (a lysine-tRNA ligase), Csa2G060350 (a transcription factor/transcription regulator), Csa2G060360 (an unknown protein), and Csa2G060370 (a Fe/S biogenesis protein). Only SNPs in the Csa2G059830 gene showed a significant DI difference between haplotypes of the defined HS and HR lines (using DI data of three experiments: Spring in 2014, autumn in 2014, and autumn in 2015) ( Figure 5c). In the HR and HS lines, 18 of the 22 HS lines carried the ATC haplotype, while 11 of the 18 HR lines had the GCT haplotype (Supplementary File 5, Figure 5b). According to the transcriptomic data of PRJNA285071 (http://cucurbitgenomics.org/rnaseq/cu/19), the expression of Csa2G059830 was higher in the susceptible material (Vlaspik) compared with the resistant material (PI 197088). Further, the expression was induced in both Vlaspik and PI 197088 upon the DM infection ( Figure 5d). For the novel locus dmG7.1, four genes were predicted in the 50-kb region ( Figure 6b). All four genes encoded a protein related to the isoflavone reductase. Among the SNPs in these four genes, six SNPs were found in the intron region of Csa7G002520 ( Figure 6c) and showed significance on DIs between haplotypes of the defined HR and HS lines (data of two experiments: Spring in 2014 and autumn in 2015). In the HR and HS lines, 15 out of the 18 HR lines carried the TCTAAC haplotype, while 10 out of the 22 HS lines had the GTGGG haplotype (Supplementary file 5) (Figure 6b). The expression level of this gene was comparable between Vlaspik and PI 197088 until 3 days postinoculation (dpi) with DM (Figure 6d), while the expression was induced in both genotypes starting from 2 dpi and the induction was stronger in PI 197088 than Vlaspik.  [24]. * and ** indicate significance at p < 0.05 and p < 0.01, respectively.
For the novel locus dmG7.1, four genes were predicted in the 50-kb region (Figure 6b). All four genes encoded a protein related to the isoflavone reductase. Among the SNPs in these four genes, six SNPs were found in the intron region of Csa7G002520 ( Figure 6c For the stable locus dmG1.4, the candidate region (Chr.1: 21,711-21,781 kb) was analyzed by pairwise LD correlations leading to a region from 21,753,524 to 21,774,330 bp by LD block (Figure 7a). Five candidate genes were located in this region ( Figure 7b). However, none of the SNPs in these genes showed a significant DI difference between haplotypes of the defined HS and HR lines. Nevertheless, the Csa1G575010 gene (encoding a pectinesterase) was interesting since SNPs were located in the CDS (coding sequence) region of this gene that led to amino acid variation (Figure 7c). In the HR and HS lines, all the 18 HR lines carried the TAGTTT haplotype, while only 6 out of the 22 HS lines had the alternate GGCACC haplotype (Supplementary file 5). Although the DI of the lines carrying the TAGTTT haplotype was lower than the alternate haplotype, the difference was not significant (using data of three experiments: Spring in 2014, autumn in 2014, and autumn in 2015) (Figure 7d). Interestingly, it was highly induced starting from 2dpi in the resistant genotype PI 197088 in contrast to almost no detected expression in the susceptible genotype of Vlaspik (Figure 7e). Possibly, the expression of this gene might play a role in DM resistance. For the stable locus dmG1.4, the candidate region (Chr.1: 21,711-21,781 kb) was analyzed by pairwise LD correlations leading to a region from 21,753,524 to 21,774,330 bp by LD block (Figure 7a). Five candidate genes were located in this region (Figure 7b). However, none of the SNPs in these genes showed a significant DI difference between haplotypes of the defined HS and HR lines. Nevertheless, the Csa1G575010 gene (encoding a pectinesterase) was interesting since SNPs were located in the CDS (coding sequence) region of this gene that led to amino acid variation (Figure 7c). In the HR and HS lines, all the 18 HR lines carried the TAGTTT haplotype, while only 6 out of the 22 HS lines had the alternate GGCACC haplotype (Supplementary file 5). Although the DI of the lines carrying the TAGTTT haplotype was lower than the alternate haplotype, the difference was not significant (using data of three experiments: Spring in 2014, autumn in 2014, and autumn in 2015) (Figure 7d). Interestingly, it was highly induced starting from 2dpi in the resistant genotype PI 197088 in contrast to almost no detected expression in the susceptible genotype of Vlaspik (Figure 7e). Possibly, the expression of this gene might play a role in DM resistance.
For the locus dmG5.2, the 50-kb region around the peak SNPs was further analyzed and a candidate region from 22,883 to 22,913 kb (~30kb) (Figure 8a) was estimated using pairwise LD correlations (r2 ≥ 0.6). In this region, five annotated genes were identified (Figure 8b): Csa5G606470 (a WRKY transcription factor 2-4), Csa5G606480 (an eukaryotic translation initiation factor 5A), Csa5G606490 (a cell division topological specificity factor), Csa5G606500 (a putative uncharacterized protein RAF9-1), and Csa5G606510 (an unknown protein). One SNP_2531403 was located in the CDS of Csa5G606470. On this site, six of the 18 HR lines was G (Pro), and all the HS lines were A (Ser) (Figure 8b, Supplementary File 5). A significant DI difference was found between the two contrast haplotypes For the locus dmG5.2, the 50-kb region around the peak SNPs was further analyzed and a candidate region from 22,883 to 22,913 kb (~30kb) (Figure 8a) was estimated using pairwise LD correlations (r2 ≥ 0.6). In this region, five annotated genes were identified ( Figure 8b): Csa5G606470 (a WRKY transcription factor 2-4), Csa5G606480 (an eukaryotic translation initiation factor 5A), Csa5G606490 (a cell division topological specificity factor), Csa5G606500 (a putative uncharacterized protein RAF9-1), and Csa5G606510 (an unknown protein). One SNP_2531403 was located in the CDS of Csa5G606470. On this site, six of the 18 HR lines was G (Pro), and all the HS lines were A (Ser) (Figure 8b, Supplementary File 5). A significant DI difference was found between the two contrast haplotypes (using data of three experiments: Spring in 2014, autumn in 2014, and autumn in 2015) (Figure 8c). The expression of Csa5G606470 was comparable in Vlaspik and PI 197088 (Figure 8d). For the locus dmG4.1, the candidate region (Chr.4: 5250-5350 kb) was analyzed by pairwise LD correlations leading to a region from 5,250,000 to 5,310,453 bp ( Figure 9a). Four candidate genes were located in this region (Figure 9b). Only for the Csa4G064680 gene, 10 SNPs showed a significant effect on DIs between haplotypes of the defined HR and HS lines (data of autumn of 2015). Of the ten SNPs, two were located in the CDS (Figure 9c, red letter), resulting in amino acid variation (Figure 9c,d). In the HR and HS lines, 20 out of the 22 HS lines carried the GATTATCTAG haplotype, while 6 out of the 18 HR lines had the CGCACATCGC haplotype (Supplementary file 5). The expression of Csa4G064680 was higher and upregulated starting from 2dpi in the resistant genotype PI 197088 (Figure 9e).
For the locus dmG4.3, no SNPs with a significant effect could be identified that colocalize with annotated genes in the analyzed region. The recently cloned CsAAP2A gene was located 200 kb away from the peak dmG4.3. Unfortunately, the mutant allele described in Berg et al. [12] could not explain the DM resistance of dmG4.3, since this mutation was not present in the CG lines (a subset of the resequenced 115 cucumber genome [13]) [12]. Further, the CsLRK10L2 gene underlying the resistance conferred by the DM4.1 QTL in PI 197088 [11] was located 1Mb away from the peak dmG4.2. From the 32 CG lines that were identified carrying the PI 197088 allele [11], DI data were available for 27 lines (Supplementary File 6). Surprisingly, a number of the 27 lines were highly susceptible, such as CG10, CG17, CG23, and CG110. Further, no significant difference was found between the mean DI of these 27 lines and the rest CG lines not carrying the PI 197088 allele ( Figure S2).  For the locus dmG4.3, no SNPs with a significant effect could be identified that colocalize with annotated genes in the analyzed region. The recently cloned CsAAP2A gene was located 200 kb away from the peak dmG4.3. Unfortunately, the mutant allele described in Berg et al. [12] could not explain the DM resistance of dmG4.3, since this mutation was not present in the CG lines (a subset of the resequenced 115 cucumber genome [13]) [12]. Further, the CsLRK10L2 gene underlying the resistance conferred by the DM4.1 QTL in PI 197088 [11] was located 1Mb away from the peak dmG4.2. From the 32 CG lines that were identified carrying the PI 197088 allele [11], DI data were available for 27 For the locus dmG7.2, no SNPs with significant effect on DI could be identified.
In this study, phenotypic data on DI resistance were collected from natural infection in the fields. Many factors, such as the uniformity of infection, environmental conditions of different years/seasons, and pathogen population in the field, may influence the disease score. This is clearly reflected in our data by the mild infection in the spring of 2014 (2014S) and uneven distribution of the infection in the spring of 2016 (2016S). Fortunately, we could combine the DI data of different experiments, which helped prevent false positive evaluation on DM resistance. For example, a group of CG lines were scored HS in 2016S, in contrast to a lower DI scored in 2014S and the autumn of 2015 (2015A) (Figure 1). Therefore, we considered them susceptible to DM. Still, we could not rule out the possibility that the HR lines selected in this study may be due to an escape of infection; thus, they might be susceptible. Nevertheless, we succeeded in identifying 18 genetic loci contributing to DM resistance. Further, 16 of the 18 genetic loci colocalized with QTLs reported in previous studies (Figure 4), which was very supportive to our GWAS results. Thus, we conclude that the CG germplasm is rich in DM resistance and offers valuable donors for breeding cucumber with resistance to DM. However, it is worthwhile to note that of the prevailing DM isolates, many change from one season to the other, and the DM resistance found in the CG lines might be race-specific [34]. This could be one of the explanations for why DI data of 2016 did not have good correlations with the other three experiments. In order to confirm the resistance of these CG lines, an artificial DM inoculation with individual isolates under control conditions is recommended.
In cucumber, three genes, Sgr [9], CsLRK10L2 [11], and CsAAP2A [12] for DM resistance have been cloned. Unfortunately, none of the three cloned genes were re-identified in this study. The Sgr gene is located on chromosome 5 in the region where none of our loci were located ( Figure 4). The donor of the sgr mutant is Gy14 that was included in this study with a code of CG47. This line showed intermediate resistance and was not selected in the HR category. Possibly, DM race(s) in the fields of this study could overcome (partially) the resistance conferred by the sgr mutant. As for the CsAAP2A gene, Berg et al. [12] found an insertion of the CUMULE transposon in the mutant allele of the DM resistant accession PI 197088, leading to a very low expression of CsAAP2A in PI 197088 compared to the susceptible genotype. Unfortunately, the csaap2a mutant allele was not present in the core collection of sequenced 115 cucumber accessions [13]. Therefore, mutation in the CsAAP2A gene could not explain the effect of dmG4.3 in this study. In addition, our results also showed that the CsLRK10L2 gene, which is potentially the casual gene for subQTL DM4.1.2 in PI 197088 [11], could not explain the dmG4.2 effect on DM resistance. It was surprising to see that the average DI of the CG lines carrying the PI 197088 allele of the CsLRK10L2 gene was even more susceptible than the CG lines without this allele ( Figure S2). Meanwhile, it is known that this subQTL, contributing to the restriction of DM sporulation, was only one of three subQTLs of the locus DM4.1 identified in PI 197088 [11]. The effect of this subQTL might be too small to be detected in this study. Alternatively, it might be that the isolates in the fields could be more aggressive overcoming the resistance conferred by this subQTL.
Several candidate genes were predicted as potential causal genes underlying the stable and novel loci, including Csa1G575030 for dmG1.4, Csa2G060360 for dmG2.1, Csa4G064680 for dmG4.1, Csa5G606470 for dmG5.2, and Csa7G004020 for dmG7.1. For the locus dmG5.2, there was an SNP in the CDS of the Csa5G606470 gene and this SNP variation resulted in amino acid changes. This candidate gene was also identified in the study of Zhang et al. [18] based on BSA-seq analysis. The Csa5G606470 gene is homologous to the Arabidopsis gene AT1G13960, which was shown to be responsive to salicylic acid and a negative regulator of the defense response to bacterium [35]. In addition, Arabidopsis homologues of other candidate genes were shown to be related with plant defense responses. For example, the homologue gene of Csa2G059830 in Arabidopsis was involved in regulation of defense response [36]. The Arabidopsis gene homologous to Csa7G002520 was involved in the lignan biosynthetic process [37]. It has been shown that the lignans from Myristica fragrans have antifungal activity that suppressed the development of rice blast and wheat leaf rust [38].
To further study the predicted candidate genes, fine mapping would be the next step for a detailed genetic study by using a population where DM resistance segregates, as this was done in many other studies listed in Table 1. A QTL interval usually contains many genes, and genetic mapping will help to narrow down the QTL interval. This has been demonstrated in the study of Berg et al. [11], wherein three subQTLs were clustered in one QTL locus. Each subQTL contributes to DM resistance by limiting different infection processes of DM including chlorosis, sporulation, and DM-induced necrosis [11]. By developing and using near-isogenic lines, these subQTLs could be differentiated and causal genes underlying two subQTLs have been cloned [11,12]. Additionally, it may be helpful to check the gene expression of the HR and HS CG lines with contrast haplotypes upon DM infection. In this study, we used the available transcriptomic data (http://cucurbitgenomics.org/rnaseq/cu/19) of PI 197088 and Vlaspik as an indication. It is intriguing that the Csa1G575010 gene located in the locus of dmG1.4 was DM-induced in the resistant genotype PI 197088 in contrast to almost no detected expression in the susceptible genotype of Vlaspik (Figure 7e). This might indicate that the expression of this gene might play a role in DM resistance in PI 197088. For a final verification of the candidate genes, overexpression or knock-out the gene would be ideal.

Plant Materials
The 97 CG lines of cucumber were provided by the cucumber research group in the Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, China. This CG population was selected from more than 3000 germplasms worldwide and resequenced [13]. The code, origin, and ecotypes of these 97 CG lines are given in Supplementary File 1. Sequencing data was on the cucumber genome website (http://www.icugi.org/cgi-bin/ICuGI/index.cgi).

Investigation of Disease Index of the Core Germplasm Resistance
All materials were grown in four experiments in the greenhouse of Nankou farm (40 • [15]. All statistics for significant differences among different materials were performed using two-way analysis of variance or a two-tailed, two-sample Student's t-test.

Genetic Diversity of DM Resistance in Germplasm
To cluster the CG lines based on their responses to DM infection, a phylogenetic tree was constructed using SAS 9.0 based on the average DI of each CG line in four experiments and presented in a heatmap [39].

Genome-Wide Association Analyses of DM Resistance
FastLMM was used for genome-wide association analyses, with an estimated relatedness matrix as the covariate, and the genome-wide minimal p-value was recorded. All 97 CG lines were included in GWAS. The 5% lowest minimal p-values were the threshold for genome-wide significance. The Manhattan map for GWAS was generated using R package CMplot [40]. SNP data used for the association analysis were from Qi et al. [13].

Linkage Disequilibrium Analysis
The software Plink [41] was used to calculate the LD coefficient (r 2 ) between pairwise high-quality SNPs. the parameters were set as: '-r2 -ld-window 999999 -ld-window-kb 1000 -ld-window-r2 0 , and the results were used to estimate LD decay.

Candidate Gene Analysis
For each locus, the 50 kb around peak SNPs was considered as the region carrying candidate genes for DM resistance. The physical distance was based on the cucurbit genomics (http://cucurbitgenomics. org/). Using the LDblock program, the candidate genes were predicted by the annotation of homologous gene in Arabidopsis and the SNP variation. The QTL effect was calculated with the R/qtl package with the multiple-QTL model (MQM) [42,43].

Conclusions
In the cucumber CG germplasm, DM resistance was identified, and it is associated with 18 genetic loci, of which 16 were located in chromosomal regions where QTLs have been reported in previous studies. The genetic resistance in CG germplasm could be explained by the previously cloned DM genes including Sgr, CsLRK10L2, and CsAAP2A, suggesting that novel genes are present on the CG germplasm that contribute to the DM resistance. Based on the annotation of homologous genes in Arabidopsis and pairwise LD correlations, five genes were predicted to be the potential candidate genes underlying the resistance of four loci, dmG1.4, dmG2.1, dmG4,1, dmG5.2, and dmG7.1. This study shows that the CG germplasm is a very valuable resource carrying known and novel QTLs for DM resistance, which can be explored in breeding cucumber with resistance to DM.