High-Density Mapping and Candidate Gene Analysis of Pl18 and Pl20 in Sunflower by Whole-Genome Resequencing

Downy mildew (DM) is one of the severe biotic threats to sunflower production worldwide. The inciting pathogen, Plasmopara halstedii, could overwinter in the field for years, creating a persistent threat to sunflower. The dominant genes Pl18 and Pl20 conferring resistance to known DM races have been previously mapped to 1.5 and 1.8 cM intervals on sunflower chromosomes 2 and 8, respectively. Utilizing a whole-genome resequencing strategy combined with reference sequence-based chromosome walking and high-density mapping in the present study, Pl18 was placed in a 0.7 cM interval on chromosome 2. A candidate gene HanXRQChr02g0048181 for Pl18 was identified from the XRQ reference genome and predicted to encode a protein with typical NLR domains for disease resistance. The Pl20 gene was placed in a 0.2 cM interval on chromosome 8. The putative gene with the NLR domain for Pl20, HanXRQChr08g0210051, was identified within the Pl20 interval. SNP markers closely linked to Pl18 and Pl20 were evaluated with 96 diverse sunflower lines, and a total of 13 diagnostic markers for Pl18 and four for Pl20 were identified. These markers will facilitate to transfer these new genes to elite sunflower lines and to pyramid these genes with broad-spectrum DM resistance in sunflower breeding.


Introduction
Downy mildew (DM) is a devastating sunflower disease throughout the world, particularly in Europe and North America [1,2]. It is incited by the oomycete pathogen Plasmopara halstedii (Farl.) Berlese & de Toni, which could overwinter and persist in the soil for 5-10 years. Cool and moist soil favors downy mildew epidemics in sunflower fields. Although sunflower is the field crop that was infected by this DM fungus, other susceptible plants of weeds in the Compositae family, such as marsh elder, could function as reservoirs for this soil-borne fungus (https://www.ag.ndsu.edu/extensionentomology/ recent-publications-main/publications/A-1331-sunflower-production-field-guide). DM infection is found mostly in the Northern Great Plains within the U.S., and the disease infected approximately 16% of sunflower fields in 2015 [3]. Substantial yield loss is expected upon DM infection, as severely infected plants will not proceed to growth at the seedling stage with few exceptions in which infected plants could still grow to maturity but not produce viable seeds. reference genomes, and their physical positions on the genome are unknown. Alternatively, two SNP markers, SFW03013 closely linked to CRT214 at 0.1 cM and SFW03060 closely linked to ORS203 at 0.5 cM, were selected, delimiting Pl 18 to a physical interval of 780,432 bp between 128,511,770-129,292,202 bp in the XRQ genome and 100,508 bp between 128,982,063-129,082,571 bp in the HA412-HO genome (Table 1, Figure 1a). A total of 150 SNPs were selected based on SNPs/InDels between HA-DM1 carrying Pl 18 and the two reference genomes in the targeted region of chromosome 2. Forty-nine were selected from the HA412-HO genome, and 101 were selected from the XRQ genome. Forty-three SNP markers showed polymorphisms between HA 89 and HA-DM1 and were used to genotype the initial 142 BC 1 F 2 individuals. Thirty-one SNP markers were mapped around Pl 18 , and SNP C2_128652042 was the only marker mapped to the Pl 18 interval between SSRs CRT214 and ORS203, which is 0.3 cM distal to Pl 18 (Figure 1b). A large marker cluster with 26 co-segregating SNPs was proximal to ORS203 at a 0.7 cM genetic distance. nine were selected from the HA412-HO genome, and 101 were selected from the XRQ genome. Fortythree SNP markers showed polymorphisms between HA 89 and HA-DM1 and were used to genotype the initial 142 BC1F2 individuals. Thirty-one SNP markers were mapped around Pl18, and SNP C2_128652042 was the only marker mapped to the Pl18 interval between SSRs CRT214 and ORS203, which is 0.3 cM distal to Pl18 (Figure 1b). A large marker cluster with 26 co-segregating SNPs was proximal to ORS203 at a 0.7 cM genetic distance.  Among 31 SNP markers mapped to the Pl 18 interval between SNP markers SFW03013 and SFW03060, only four, S2_128980821, S2_128982087, S2_129074800, and S2_129077858, were from the HA412-HO genome, and their physical positions on both HA412-HO and XRQ assemblies were in agreement with their genetic positions (Table 1, Figure 1b). However, no SNPs designed from the 93 kb region between S2_128982087 and S2_129074800 were mapped. The remaining 27 mapped SNPs were all from the XRQ genome ( Figure 1b). All of the markers were physically in accordance with their genetic positions on the XRQ assembly; however, these SNPs did not align to the target region between 128,982,147 and 129,082,571 bp delimited by SFW03013 and SFW03060 in the HA412-HO assembly ( Table 1). Among 27 mapped SNPs derived from a 302 kb region between C2_128624983 and C2_128926640 of XRQ, three aligned to 132.74-132.75 kb positions of HA412-HO, two to 97.02-97.03 kb positions, 21 to 129.43-129.67 kb positions, and one did not align to chromosome 2 ( Table 1). The results suggest that the 93 kb region between S2_128982087 and S2_129074800 in HA412-HO may not be assembled correctly, which may also explain why no SNPs designed from this region were mapped.

Saturation and Fine Mapping of Pl 20
Pl 20 was previously placed within the 2.5 Mb interval of 11,271,845-13,781,094 bp on chromosome 8 of the HA412-HO reference genome flanked by SNP markers S8_11272046 and SFW01496 [1]. A total of 244 SNP markers were selected from the SNP/InDel calling of whole-genome sequencing of HA-DM7 with HA412-HO and XRQ on chromosome 8; 84 from the HA412-HO genome covering a region of 447 kb (12,254,701,559 bp), and 160 from the XRQ genome covering a region of approximately 1.0 Mb (7,890,010-8,906,527 bp). The 244 SNP markers potentially surrounding Pl 20 were tested between parents HA 89 and HA-DM7, and 25 showed polymorphism, including 2 from HA412-HO and 23 from XRQ. These 25 polymorphic markers were further genotyped with 114 BC 1 F 2 individuals of the original population, and all were mapped around Pl 20 (Figure 2b). A marker cluster with 26 SNPs co-segregated with Pl 20 , and SNP marker C8_8639656 was proximal to Pl 20 at a 0.88 cM genetic distance.  To dissect the above marker cluster and to develop a high-density map of Pl 20 , a large population with the 2485 BC 1 F 3 individuals selected from the BC 1 F 3 families heterozygous for Pl 20 was genotyped using two flanking SNP markers, SFW01920 and S8_100385559. A total of 214 BC 1 F 3 recombinants were identified in the target region delimited by the two markers and advanced to the next generation for DM testing of the recombinant families.
Twenty-two co-dominant SNP markers in the saturation map were selected to genotype the 214 recombinants identified from the above large population. Additionally, seven previously mapped SNP markers, SFW01920, SFW09076, S8_11272025, S8_11272046, SFW04358, SFW02745, and S8_100385559, within the Pl 20 region were also included in the fine mapping [1]. The combined phenotype and marker data of the recombinants positioned Pl 20 to a 0.20 cM interval, flanked by SNP markers C8_7921819 (0.06 cM) and C8_8012577 (0.14 cM) (Figure 2c). This genetic region corresponds to a 91.2 kb segment in the XRQ assembly (Table 2).

Identification of Candidate Genes for Pl 18 and Pl 20
In the high-density map, all of the newly developed SNP markers were mapped around Pl 18 and physically located in a 780 kb region between 128,511,770 and 129,292,202 bp on chromosome 2 of the XRQ assembly (Table 1). These SNP markers are genetically and physically consistent with the position in the XRQ genome, and thus the 780 kb genomic sequences in the target region were analyzed from the XRQ database (https://www.heliagene.org/HanXRQ-SUNRISE/). Seven highly confident genes were found in the target region, and one putative gene, HanXRQChr02g0048181, was predicted to code a powdery mildew resistance protein with the typical disease resistance gene domain of nucleotide binding and leucine-rich repeat (NLR) ( Table 3). This gene is located from 128,920,787 to 128,926,787 bp along chromosome 2 with a length of 6 kb. Its genetic and physical positions, as well as its functional domains and predicted functions, support it as a candidate gene for Pl 18 . Similarly, an approximately 610 kb genomic sequence (7,894,504,555 bp) between SNP markers C8_7895128 and C8_8504355 of chromosome 8 was extracted from the XRQ database for annotation, and seven genes in this region were identified. Out of seven genes, HanXRQChr08g0210051 has a typical NLR domain, and HanXRQChr08g0210151 has a leucine-rich repeat domain (Table 3). HanXRQChr08g0210051 was located from 8,010,685 to 8,035,718 bp on chromosome 8 with a length of 25 kb, which falls to the Pl 20 gene interval between SNP markers C8_7921819 and C8_8012577 and could be a candidate gene for Pl 20 (Table 3,). HanXRQChr08g0210151 was located from 8,408,566 to 8,411,305 bp on chromosome 8 with a length of 2.7 kb (Table 3).

Development of Diagnostic Markers for Pl18 and Pl20
The 31 new SNP markers mapped to Pl18 in the saturation map were first tested in six sunflower lines, including four resistant lines, HA 458 (Pl17), HA-DM1 (Pl18), RHA 340 (Pl8), and RHA 464 (PlArg), and two susceptible lines, HA 89 and CONFSCLB1. Twenty-two of them showed a unique PCR pattern in HA-DM1 in contrast to the other three resistant and two susceptible lines and were further genotyped in an evaluation panel with 96 selected sunflower lines to determine their specificity in the sunflower population and to assess their potential in MAS for Pl18 (Supplementary Table S2 Figure 3a). Pl18introgressed lines, HA-DM1 and HA-DM4, show unique Pl18 marker alleles, differentiating them from other sunflower lines (Figure 3a).
Eight SNP markers fine mapped around Pl20 were selected to test for specificity in the evaluation panel of 96 selected sunflower lines, and four showed unique patterns in HA-DM7 (Pl20-introgressed line), in contrast to others without the Pl20 gene (Table 2; Figure 3b). These Pl18 and Pl20 diagnostic markers identified are of great importance and usefulness to assist selection for both genes in sunflower breeding programs.
Genomic regions encompassing NLR clusters are very likely attributed to duplications in which chromosome doubling is presumed to occur during sunflower evolution [13]. Three models have been proposed for duplicated genes, i.e., pseudogenized (loss of regulatory sub-function), sub-functionalized (partitioning of the function between daughter copies) and/or neo-functionalized (functional diversification) [23]. With a fair chance that some of the NLR-involved genes within the clusters were pseudogenized after duplication and during the interaction with pathogens, it is possible that the genes conferring resistance might not be present in the reference genome of XRQ, even though the typical NLR motifs were present. To validate their candidacy, the candidate genes predicted from the reference genome need to be landed to the resistance donor lines, followed by functional characterization. Because of the short reads from the Illumina whole-genome sequencing and high level of repetitive sequences in the sunflower genome, it is difficult to assemble a scaffold covering the entire gene sequence from the sequenced donor line because most of the contigs (81%) and scaffolds (86%) assembled in a previous study ranged between 100 and 500 bp, and only 6% of contigs and 8% of scaffolds were over 1 kb, leaving a large number of gaps [24]. The physical localization of each Pl 18 and Pl 20 to a region less than 100 kb on chromosomes 2 and 8, respectively, in the present study represents a significant step toward the final cloning and functional characteristics of these R loci. PacBio long-read target region sequencing provides a powerful tool to capture these two genomic regions harboring the candidate genes. This technology combined with analysis of ethyl methanesulfonate (EMS)-induced mutants will allow us to distinguish among the possibilities and to uncover the genetic and molecular basis of DM disease resistance in sunflower.
Unlike most DM R genes located on clusters in sunflower chromosomes 1, 4, 8, and 13 mentioned above, only two genes, Pl 18 and Pl 26 , were mapped to sunflower chromosome 2 [25]. As a result of the limited mapping resolution and lack of recombination in the region, Pl 26 was placed in a relatively larger interval of 114 Mb physically on XRQ (26,000,000-140,000,000 bp), while Pl 18 was located within the 128,640,208-129,297,096 bp interval of chromosome 2. Pl 18 originated from H. argophyllus accession PI 494573 collected from Texas, U.S., while Pl 26 originated from H. annuus HAS103. Although Pl 18 falls within the large region encompassing Pl 26 on chromosome 2, their different origins suggest that they are different resistance genes. Further fine mapping of Pl 26 would elucidate the genetic relationship of the two genes.
Chromosome 8 of sunflower represents the largest and most important NLR cluster, including 54 NLR loci [26]. The DM R gene cluster was located in the first and largest sub-cluster containing Pl 1 , Pl 2 , Pl 6 , Pl 7 , Pl 15 and Pl 20 and two rust R genes R 1 and R 15 [1]. Pl 20 originated from H. argophyllus is different from other Pl genes in the cluster in which Pl 1 , Pl 2 and Pl 6 were from wild H. annuus, Pl 7 was from H. praecox, and Pl 15 was identified from an Argentinian restorer inbred line [1,[27][28][29][30][31]. Pl 20 was immune to all P. halstedii races identified in North America, including those predominant and virulent races; however, the remaining Pl genes, except Pl 15 in the cluster, have already been overcome by some or all of the identified P. halstedii races. In the current study, four SNP markers, C8_7890010, C8_7895128, C8_7919216, and C8_8800366, showed unique PCR patterns in HA-DM7 (Pl 20 -introgressed line), distinguishing it from other Pl genes in the cluster. All the findings suggest that Pl 20 is a novel and most effective DM R gene that serves as a powerful resistance resource for durable DM control in sunflower.
MAS has been extensively used in modern plant breeding, especially for traits controlled by single genes. The success of MAS is influenced by the relationship between the markers and the genes of interest, and it is important that the recombination frequency between the target gene and the marker is as low as possible. Our high-resolution genetic maps and the diagnostic markers for Pl 18 and Pl 20 developed will be useful tools facilitating the transfer of these new genes to elite sunflower lines in breeding programs.

Mapping Populations and Evaluation Panel
The Pl 18 F 2 initial mapping population was created from a cross of nuclear male sterile (NMS) HA 89 × H. argophyllus accession PI 494573, and F 1 was backcrossed with normal HA 89, including 142 BC 1 F 2 individuals [7]. Similarly, the Pl 20 F 2 initial mapping population was developed from a cross of NMS HA 89 × H. argophyllus accession PI 494578, including 114 BC 1 F 2 individuals [7]. The H. argophyllus accessions, PI 494573 and PI 494578, were found to be resistant to new races of P. halstedii, while HA 89 is susceptible to all P. halstedii races [6]. The germplasms HA-DM1 with Pl 18 and HA-DM7 with Pl 20 were developed and released in 2015 and 2019, respectively [8], and were each used as the Pl 18 and Pl 20 donor lines for whole-genome resequencing for high-density mapping of both R genes.
For fine mapping of Pl 20 , recombinants were screened from 2,485 BC 1 F 3 individuals selected from the previously characterized BC 1 F 2:3 families heterozygous for Pl 20 . Each selected heterozygous F 3 family equates to a segregating F 2 population for the Pl 20 gene.
The specificity of diagnostic Pl 18 and Pl 20 SNP markers was tested in the sunflower evaluation panel, consisting of 96 sunflower inbred lines of diverse origins. This panel includes 24 and 17 lines with different DM and rust R genes, respectively (Supplementary Table S2).

SNP Marker Development from Whole-Genome Resequencing
HA-DM1 (Pl 18 ) and HA-DM7 (Pl 20 ) were sequenced at the whole-genome level separately by CD Genomics Inc. using the Illumina HiSeq sequencing platform. According to the protocols, genomic DNA of HA-DM1 and HA-DM7 was first checked for quality to ensure that the level of contamination and degradation was low enough to meet their requirements. The quality genomic DNA was sheared with the use of an S/E210 focused ultrasonicator (Covaris, Woburn, MA, USA) for library construction. Qualified libraries for either Pl 18 or Pl 20 were pooled and subjected to sequencing at 40 × genome coverage. The raw reads containing adaptors, reads with >1% ambiguous bases, and reads with low quality (greater than 50% bases less than 15 Q score) were removed and excluded for further analysis. The clean reads were aligned to the two reference genomes of XRQ (https://www.heliagene. org/HanXRQ-SUNRISE/) and HA412-HO (https://www.heliagene.org/HA412.v1.1.bronze.20141015/), respectively. After filtering of low-quality reads, a total of 1,166,680,112 (99.09%) HA-DM1 reads and 1,023,555,572 (98.74%) HA-DM7 reads were mapped to the references XRQ and HA412-HO, respectively. All SNPs and InDels were identified using the mapped reads. The SNP markers were named with prefix C2, S2, C8 or S8 followed by a number representing the physical position of the SNPs along either chromosome 2 or 8 of each reference genome assembly. C2 and C8 represent the SNPs from chromosomes 2 and 8 of the XRQ reference genome, while prefixes S2 and S8 represent the SNPs from chromosomes 2 and 8 of the HA412-HO reference genome.

Genotyping of PCR-Based SNP Markers and Linkage Analysis
Polymerase chain reaction (PCR)-based SNP primers were designed with the Primer 3 program, and specific mismatches and length polymorphisms for SNP primers were created (Supplementary Table S3) as described by Qi et al. [32] and Long et al. [33] based on SNP flanking sequences (Supplementary Tables S4 and S5). PCR for SNPs was conducted as described by Ma et al. [34], and amplicons were separately scored on a 6.5% polyacrylamide gel using an IR2 4300/4200 DNA analyzer (LI-COR, Lincoln, NE, USA).
The chi-square (χ 2 ) test was performed on genotyping data of each marker to test for goodness-of-fit to the Mendelian segregation ratio, i.e., 1:3 for dominant markers and 1:2:1 for co-dominant markers. Upon the exclusion of those unfitted, markers fitting Mendelian ratios were linkage analyzed with either Pl 18 or Pl 20 phenotyping data using JoinMap 4.1 software in which a regression mapping algorithm and Kosambi's mapping function were selected [35]. The cutoffs of linkage analysis among markers were set at a likelihood of odds (LOD) ≥ 3.0 and maximum genetic distance ≤ 50 centimorgans (cM).

Phenotypic Evaluation of Recombinants
Pl 20 recombinants identified with the respective flanking markers were tested for DM resistance using the P. halstedii isolate of race 734, together with their introgressed line HA-DM7 and susceptible parent HA 89, using the whole seedling immersion method as described by Gulya et al. [36] and Qi et al. [32]. Briefly, approximately 40 seeds from each recombinant family were germinated and inoculated with the P. halstedii isolate of race 734 after 2-3 days, and at least 30 seedlings for each recombinant family were evaluated. Susceptible seedlings showed sporulation on cotyledons and true leaves, and resistant seedlings showed no sporulation. The genotype of each recombinant was determined as homozygous susceptible if all seedlings in the recombinant family showed sporulation on cotyledons and true leaves, homozygous resistant if none of the seedlings exhibited sporulation, and segregating if some seedlings showed sporulation on cotyledons and true leaves while some showed no sporulation.
Supplementary Materials: The following are available online at http://www.mdpi.com/1422-0067/21/24/9571/s1, Table S1. Originating species of designated downy mildew resistance genes; Table S2. Evaluation panel of 96 sunflower lines used to determine the specificity of Pl18 and Pl20 markers; Table S3. Primer sequences of SNP markers mapped in the present study; Table S4. Sequences of SNP markers mapped around Pl18 in the present study (The diagnostic SNP markers for Pl18 are shown in bold); Table S5. Sequences of SNP markers mapped around Pl20 in the present study (The diagnostic SNP markers for Pl 20 are shown in bold). Reference  are cited in supplementary materials part. Funding: This project was supported by the USDA-AMS Specialty Crop Block Grant Programs 16-SCBGPND-0029 and AM170100XXXXG05 and the USDA-ARS CRIS Project No. 3060-2100-043-00D. The mention of trade names or commercial products in this report is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture. The USDA is an equal opportunity provider and employer.