Development and Validation of Markers for the Fertility Restorer Gene Rf1 in Sunflower

Hybrid breeding in sunflowers based on CMS PET1 requires development of restorer lines carrying, in most cases, the restorer gene Rf1. Markers for marker-assisted selection have been developed, but there is still need for closer, more versatile, and co-dominant markers linked to Rf1. Homology searches against the reference sunflower genome using sequences of cloned markers, as well as Bacterial Artificial Chromosome (BAC)-end sequences of clones hybridizing to them, allowed the identification of two genomic regions of 30 and 3.9 Mb, respectively, as possible physical locations of the restorer gene Rf1 on linkage group 13. Nine potential candidate genes, encoding six pentatricopeptide repeat proteins, one tetratricopeptide-like helical domain, a probable aldehyde dehydrogenase 22A1, and a probable poly(A) polymerase 3 (PAPS3), were identified in these two genomic regions. Amplicon targeted next generation sequencing of these nine candidate genes for Rf1 was performed in an association panel consisting of 27 maintainer and 32 restorer lines and revealed the presence of 210 Single Nucleotide Polymorphisms (SNPs) and 67 Insertions/Deletions (INDELs). Association studies showed significant associations of 10 SNPs with fertility restoration (p-value < 10−4), narrowing Rf1 down to three candidate genes. Three new markers, one co-dominant marker 67N04_P and two dominant markers, PPR621.5R for restorer, and PPR621.5M for maintainer lines were developed and verified in the association panel of 59 sunflower lines. The versatility of the three newly developed markers, as well as of three existing markers for the restorer gene Rf1 (HRG01 and HRG02, Cleaved Amplified Polymorphic Sequence (CAPS)-marker H13), was analyzed in a large association panel consisting of 557 accessions.


Introduction
In the last 30 years, progress in sunflower hybrid breeding has been accompanied by the development of a large number of molecular markers for disease resistance, quality traits, herbicide resistance, and fertility restoration [1]. Sunflower hybrid production exploits heterotic phenomenon (phenotypic superiority of the progeny over its parents). Hybrid development relies on the use of sugar beet, encoding an OMA1 (overlapping activity with m-AAA protease 1)-like protein that acts post-translational [33,34].
In sunflowers, cytoplasmic male sterility in CMS PET1 is characterized by the expression of CMS-specific 16-kDa-protein encoded by orfH522, which is co-transcribed with atp1 [35][36][37]. Fertility restoration leads to a tissue-specific post-transcriptional reduction of the co-transcript of atp1 and orfH522 in the anthers, thereby reducing the 16-kDa-protein [38,39]. The RNA instability of the co-transcript that allows fertility restoration might be due to an interaction with a PPR protein [19], but also a poly(A) polymerase (PAPS) might be a potential candidate as polyadenylation-assisted RNA degradation plays a major role in post-transcriptional control of expression in plant mitochondria [40,41].
In this publication, we would first like to address new possibilities for developing markers based on the available whole sunflower genome sequence [15], and secondly the versatility of markers in different lines and their usability for marker-assisted breeding. By localizing the restorer gene Rf1 in the sunflower reference genome, we were able to screen the two genomic regions for annotated candidate genes. Nine possible candidates, including PPR genes, were identified and the genetic diversity was captured in an association panel by amplicon-targeted next generation sequencing (NGS). Single Nucleotide Polymorphisms (SNPs) significantly associated with fertility restoration were identified and the first new markers were developed.

Identification of BAC Clones by Hybridization with Cloned Markers for the Restorer Gene Rf1
RAPD-markers (OP-K13_454A, OP-Y10_740A, OP-H13_337A) and AFLP-markers (E41M48_113A, E42M76_125A, E62M52_249A) linked with the restorer gene Rf1 [9,11], as well as the BAC end of 067N04, were used to derive overgo probes for colony hybridizations against two BAC libraries, one of the restorer line RHA 325 [42], and one of the maintainer line HA383 (CUGI). In total, 28 BAC clones were identified: six BAC clones of RHA 325 and 22 BAC clones of HA 383 ( Table 1). The overgo probes for OP-H13_337A and OP-Y10_740A did not give signals with any BAC clones. All identified BAC clones were sequenced with T7 and SP6 to obtain BAC end sequences for the localization of the restorer gene Rf1 by using BLAST against the sunflower reference genome [15]. In addition, larger BAC-end fragments obtained by BamHI digests of the identified BACs were cloned and sequenced. Using BAC end sequence data belonging to the identified BAC clones from the BAC libraries of RHA 325 and HA 383 and sequences of markers linked to the restorer gene Rf1 [9,11], homologies were detected along chromosome 13 of HanXRQ in the genome sequence [15] from position 10,990 to 173,581,392 ( Figure 1). Homologies to SNP marker sequences from the SNP array [43] and Simple Sequence Repeat (SSR) markers [44] were included in the graphic for general orientation. The homologies were not as concentrated as expected, which might be because HanXRQ represents a maintainer line and not a restorer line. Recombination events involving larger genome regions may have occurred in the restorer lines. Using BAC end sequence data belonging to the identified BAC clones from the BAC libraries of RHA 325 and HA 383 and sequences of markers linked to the restorer gene Rf1 [9,11], homologies were detected along chromosome 13 of HanXRQ in the genome sequence [15] from position 10,990 to 173,581,392 ( Figure 1). Homologies to SNP marker sequences from the SNP array [43] and Simple Sequence Repeat (SSR) markers [44] were included in the graphic for general orientation. The homologies were not as concentrated as expected, which might be because HanXRQ represents a maintainer line and not a restorer line. Recombination events involving larger genome regions may have occurred in the restorer lines.  [14], SSR markers in blue [44], and SNP markers in turquoise [43]. The two areas investigated for potential candidate genes for Rf1 are shown as orange bar and white bar.  [14], SSR markers in blue [44], and SNP markers in turquoise [43]. The two areas investigated for potential candidate genes for Rf1 are shown as orange bar and white bar. However, assuming microsynteny, our major interest focused on two regions of 30 Mb and 3.9 Mb, respectively. These areas showed homology to clusters of BAC clones. In addition, the first region showed homology to markers as OP-K13_454 and OP-Y10_750, which have been successfully used in marker-assisted selection for the restorer gene Rf1 [9,10], and to the BAC clone 067N04, which had been successfully back mapped to Rf1. The second area flanked by ORS1030 and OP-H13 [11] represents a small cluster of BACs with homology to the area. Both markers represent co-dominant markers that had also been directly linked to the restorer gene Rf1.

Potential Candidate Genes for Rf1 in the Annotated Sunflower Reference Genome
The two regions on linkage group 13 of the sunflower genome spanning the areas (1) 28,051,124 (067N04_SP6)-58,081,625 (89P04_T7) and (2) 169,655,088 (ORS1030)-173,581,392 (OP-H13) were selected as potential locations of the restorer gene Rf1, and therefore regarded as most promising for future marker developments. Based on the annotation of the HanXRQ genome sequence (https://www.heliagene.org/HanXRQ-SUNRISE/), nine potential candidate genes located within the two genomic regions were selected for next generation sequencing (NGS) by custom targeted amplification (Table 2). These represent six putative pentatricopeptide repeat genes. In addition, a gene for a tetratricopeptide-like helical domain, a probable aldehyde dehydrogenase 22A1, and a probable poly(A) polymerase 3 (PAPS3) were also present in these two areas. The genes, as well as 2,000 bp upstream and 500 bp downstream, were sequenced by NGS in order to obtain information about the coding sequences, as well as about potential regulatory sequences. In total, 58,209 bp covering these genes were selected for sequencing by an NGS approach and analyzed for SNPs and INDELs that can be used for development of markers.

SNP Detection in the Sunflower Association Panel
Amplicon-targeted next generation sequencing of nine potential candidate genes for Rf1 in an association panel consisting of 27 maintainer and 32 restorer lines (Supplementary Table S1) and alignment to the HanXRQ genome sequence identified 277 variants (210 SNPs and 67 INDELs) ( Table 3). The haplotype variation among the maintainer and restorer lines based on the SNPs is shown in Supplementary Table S2. The four genes (HanXRQChr13g0392791, HanXRQChr13g0393411, HanXRQChr13g0394161, HanXRQChr13g0394751) in the large region between 28,051,124 bp (067N04_SP6-58,081,625 bp (89P04_T7) were highly conserved from the sequence and showed only 23 variants (4 SNPs and 19 INDELs). These genes encode two pentatricopeptide repeat proteins, PPR791 and PPR161, and the probable aldehyde dehydrogenase 22A1, as well as the probable poly(A) polymerase 3 (PAPS3). In contrast, the five candidate genes (HanXRQChr13g0418841, HanXRQChr13g0418851, HanXRQChr13g0418861, HanXRQChr13g0419621, HanXRQChr13g0419631) in the 3.9-Mb-region between 169,655,088 bp (ORS1030)-173,581,392 bp (OP-H13) proved to be highly polymorphic, with a total of 254 variants (206 SNPs and 48 INDELs). Even the exon regions still showed 76 SNPs and 8 INDELs. These genes encode four pentatricopeptide repeat proteins, PPR841, PPR861, PPR621 (also named PP198), and PPR631. The fifth gene (HanXRQChr13g0418851) is annotated as a gene encoding a tetratricopeptide-like repeat helical domain. However, the corresponding gene in Arabidopsis (At1g12700) represents a pentatricopeptide repeat protein.

Populations Structure
The population structure of the association panel consisting of 27 maintainer und 32 restorer lines was analyzed based on 34 microsatellite markers using the program STRUCTURE V.2.3.4 to avoid false positive associations due to the population structure. Two microsatellites per chromosome were used to obtain a good coverage of the genome. By use of the Evanno method, two probable subpopulations were identified in the sunflower association panel using STRUCTURE V.2.3.4. (Figures 2 and 3). Twenty-eight sunflower accessions were grouped in the first subpopulation (sp1). The majority of sunflower accessions belonging to this group were restorer lines, except for UGA-SAM1-171 and D-75-10. Thirty-one sunflower accessions were grouped in the second subpopulation (sp2) with the majority being maintainer lines (except for six lines: GN-0778, RHA 282, UGA-SAM1-041, UGA-SAM1-149, UGA-SAM1-181, and UGA-SAM1-278). The mean value of the fixation index (Fst) of the first subpopulation was 0.11, while Fst for the second subpopulation was higher with a value of 0.26.

Populations Structure
The population structure of the association panel consisting of 27 maintainer und 32 restorer lines was analyzed based on 34 microsatellite markers using the program STRUCTURE V.2.3.4 to avoid false positive associations due to the population structure. Two microsatellites per chromosome were used to obtain a good coverage of the genome. By use of the Evanno method, two probable subpopulations were identified in the sunflower association panel using STRUCTURE V.2.3.4. (Figures 2 and 3). Twenty-eight sunflower accessions were grouped in the first subpopulation (sp1). The majority of sunflower accessions belonging to this group were restorer lines, except for UGA-SAM1-171 and D-75-10. Thirty-one sunflower accessions were grouped in the second subpopulation (sp2) with the majority being maintainer lines (except for six lines: GN-0778, RHA 282, UGA-SAM1-041, UGA-SAM1-149, UGA-SAM1-181, and UGA-SAM1-278). The mean value of the fixation index (Fst) of the first subpopulation was 0.11, while Fst for the second subpopulation was higher with a value of 0.26.   The population structure of the association panel consisting of 27 maintainer und 32 restorer lines was analyzed based on 34 microsatellite markers using the program STRUCTURE V.2.3.4 to avoid false positive associations due to the population structure. Two microsatellites per chromosome were used to obtain a good coverage of the genome. By use of the Evanno method, two probable subpopulations were identified in the sunflower association panel using STRUCTURE V.2.3.4. (Figures 2 and 3). Twenty-eight sunflower accessions were grouped in the first subpopulation (sp1). The majority of sunflower accessions belonging to this group were restorer lines, except for UGA-SAM1-171 and D-75-10. Thirty-one sunflower accessions were grouped in the second subpopulation (sp2) with the majority being maintainer lines (except for six lines: GN-0778, RHA 282, UGA-SAM1-041, UGA-SAM1-149, UGA-SAM1-181, and UGA-SAM1-278). The mean value of the fixation index (Fst) of the first subpopulation was 0.11, while Fst for the second subpopulation was higher with a value of 0.26.

Association of SNPs with Fertility Restoration
Association studies were performed using the general linear model (GLM) in TASSEL V5.0, taking into account the population structure (Q) as calculated by STRUCTURE V.2.3.4. Out of 277 variants detected in the sequences of the nine candidate genes for the restorer gene Rf1, 74 SNPs were found to be significantly associated with restoration of fertility at p < 0.05. In the 30-Mb-region, only three SNPs at position 56,431,576, 56,432,393, and 56,433,210, all located in PAPS3, were found to be associated with male fertility at this significance level. The remaining 71 SNPs were all residing in the 3.9-Mb-region. The Manhattan plot shows the distribution of all SNPs in the 3.9-Mb-region ( Figure 4). Twenty-one SNPs still showed associations at p-values lower than 10 −3 (Table 5). After applying the Bonferroni correction, 10 SNPs were finally left above the calculated threshold (p < 1.81 × 10 −4 ). All of these highly significant associated SNPs were located in three of the pentatricopeptide repeat genes in the 3.9-Mb-region (169,655,088-173,581,392 bp): four each were observed in PPR861 and PPR841, but only two in PPR621 (Table 5). These 10 SNPs would be the most interesting ones to be used for marker development. Looking at the haplotype of these with the fertility restoration associated SNPs, 85.2% of the maintainer lines in the association panel showed the same haplotype as HanXRQ (Table 4)

PAMSA Marker System
Polymerase chain reaction (PCR) Amplification of Multiple Specific Alleles (PAMSA) markers represent a SNP-based three primer system, where the addition of a 15-bp-tail at the 5′ end of one of the forward primer allows an allele-specific size differentiation of the PCR products [45]. The PAMSA marker, named 67N04_P, was created for the detection of a G to A mutation in the BAC-end sequence 67N04-B2 BamHI (RHA 325, [13]) observed in comparison to HanXRQ. Detection of this SNP was

PAMSA Marker System
Polymerase chain reaction (PCR) Amplification of Multiple Specific Alleles (PAMSA) markers represent a SNP-based three primer system, where the addition of a 15-bp-tail at the 5 end of one of the forward primer allows an allele-specific size differentiation of the PCR products [45]. The PAMSA marker, named 67N04_P, was created for the detection of a G to A mutation in the BAC-end sequence 67N04-B2 BamHI (RHA 325, [13]) observed in comparison to HanXRQ. Detection of this SNP was performed in agarose gels in order to make it accessible even in simply equipped laboratories. The codominant marker created enabled discrimination between the restorer line RHA 325 and the maintainer line HA 342 due to difference the 15-bp-difference in the forward primer length.  This marker was later tested in the large association panel (see Section 2.4. Versatility of markers for the restorer gene Rf1), in which it amplified a band of 170 bp in lines that contained the nucleotide A (as present in RHA 325) and a band of 155 bp for maintainer carrying the nucleotide G.

New SNP-Based Markers from the Association Analyses
For all 10 SNPs significantly associated with fertility restoration primers were derived and tested in RHA 325 and HA 342 (data not shown), but only for the SNP PPR621.5 reproducible polymerase chain reaction (PCR) products were amplified that differentiated immediately between this restorer and this maintainer line. Two Sequence Characterized Amplified Region (SCAR) markers, PPR621.5M and PPR621.5R, were successfully developed for detection of G for maintainers (PPR621.5M) and C for restorers (PPR621.5R) based on the SNP at position 173,473,513 in the pentatricopeptide repeat candidate gene PPR621 (PP198; HanXRQChr13g0419621). Separate PCR reactions had to be performed for detection of the presence of the G or C nucleotide. In both reactions, a common reverse primer was used for the amplification, while the difference was in the sequence of the allele-specific forward primers. For detection of the G nucleotide using the primer combination PPR621.5_F1 and PPR621.5_Rev However, also in very few cases inconsistencies were observed between the NGS and the PCR results. For the two maintainer lines UGA-SAM1-219 (HA1) and UGA-SAM1-224 (HA 248), a band was amplified by use of PPR621.5_F1 and PPR621.5_Rev, which was designed for detection of the G nucleotide as expected in the reference maintainer genotype and other maintainer accessions, but NGS results had shown the nucleotide C for these two lines. Similarly, the two methods did not give However, also in very few cases inconsistencies were observed between the NGS and the PCR results. For the two maintainer lines UGA-SAM1-219 (HA1) and UGA-SAM1-224 (HA 248), a band was amplified by use of PPR621.5_F1 and PPR621.5_Rev, which was designed for detection of the G nucleotide as expected in the reference maintainer genotype and other maintainer accessions, but NGS results had shown the nucleotide C for these two lines. Similarly, the two methods did not give matching results for the two restorer lines UGA-SAM1-207 (RHA 419) and UGA-SAM1-242 (RHA 355), where the presence of the nucleotide G was established by NGS, and the nucleotide C by PCR. As only four lines showed inconsistencies between NGS and PCR, we observed 93% accordance between the two methods.

Versatility of Markers for the Restorer Gene Rf1
Testing Markers in A Large Association Panel for Their Efficiency A large sunflower association panel containing 557 diverse accessions was used for validation of molecular markers created for the detection of the Rf1 gene (Supplementary Table S3). Within the panel were maintainer sunflower lines with no Rf genes and restorer sunflower lines that harbored different Rf genes. However, bearing in mind that Rf1 is widely used in modern sunflower hybrid production (Yue et al. 2010), it is expected that most restorer lines possess the Rf1 gene. Molecular markers used for validation were: STS-markers HRG01 (derived from RAPD fragment OPK13_454), HRG02 (converted from RAPD fragment OPY10_740) [9], OPH13_337 (CAPS marker H13 Hinf I developed from OPH13 337) [11], orfH522-CMS for CMS PET1, in comparison to the newly described co-dominant PAMSA marker 67N04_P, as well as the SCAR-markers PPR621.5R and PPR621.5M. The banding pattern of the restorer line RHA 325 (UGA-SAM1-114) was used as reference for restorer lines carrying the Rf1 gene. For the maintainer profiles, HA 342 served as reference. The restorer line RHA 280 (UGA-SAM1-008) harboring the Rf3 gene for all markers showed a maintainer profile.
Markers 67N04_P and HRG02 provided the most similar results concerning the marker-trait association (Supplementary Table S3). PAMSA marker 67N04_P amplified a band of 170 bp in 130 accessions (including heterozygous accessions), identifying them as potential restorer lines, while HRG02 amplified a band of approximately 738 bp in 127 sunflower lines (Table 6, Figure 7). For these two markers, profiles obtained matched in 98.4 % of all accessions. HRG01, also previously linked to Rf 1 in RHA 325 [9], amplified a band of 462 bp in 100 sunflower lines of the association panel (Table 6, Supplementary Table S3). CAPS-marker H13 Hinf I proved to be the most inefficient marker for discrimination between restorer and maintainer accessions. HRG02 amplified a band of approximately 738 bp in 127 sunflower lines (  Figure 7). For these two markers, profiles obtained matched in 98.4 % of all accessions. HRG01, also previously linked to Rf1 in RHA 325 [9], amplified a band of 462 bp in 100 sunflower lines of the association panel (Table  6, Supplementary Table S3). CAPS-marker H13 HinfI proved to be the most inefficient marker for discrimination between restorer and maintainer accessions.

Discussion
Marker-assisted assessment of breeding pools includes genotyping of cultivars, determination of purity, and estimation of genetic diversity for selecting parents in breeding programs [46]. Hybrid breeding is most efficient if separate breeding pools for restorer and maintainer lines can be established to maximize use of heterosis (hybrid vigor) [47].
In this study, we focused on the development of new, especially codominant, markers linked to the fertility restorer gene Rf1 in sunflowers, which are needed to develop lines for the restorer pool. To use the information from the sunflower reference genome [15], it was first necessary to locate Rf1 in the genome by sequence homology to known markers and BAC clones linked to the restorer gene [14]. Two regions, (1) 28,051,124 (067N04_SP6)-58,081,625 (89P04_T7) and (2) 169,655,088 (ORS1030)-173,581,392 (OP-H13), were identified as possible locations of the restorer gene Rf1 in the sunflower genome. This first seemed unusual as the regions are 114 Mb apart. However, the homology searches were performed against the sunflower reference genome of the maintainer line HanXRQ. Major recombination events involving large genomic regions may have occurred in the introduction of the restorer gene Rf1 that could perhaps explain two separate regions with homologies. However, false assemblies of contigs on chromosome 13 during the whole genome assembly cannot be excluded either. Whole genome sequencing of additional maintainer and restorer lines and comparison to the reference maintainer genome [15] are necessary and could reveal this. However, another fact might also point in this direction. Huge efforts have been undertaken to develop codominant markers for

Discussion
Marker-assisted assessment of breeding pools includes genotyping of cultivars, determination of purity, and estimation of genetic diversity for selecting parents in breeding programs [46]. Hybrid breeding is most efficient if separate breeding pools for restorer and maintainer lines can be established to maximize use of heterosis (hybrid vigor) [47].
In this study, we focused on the development of new, especially codominant, markers linked to the fertility restorer gene Rf1 in sunflowers, which are needed to develop lines for the restorer pool. To use the information from the sunflower reference genome [15], it was first necessary to locate Rf1 in the genome by sequence homology to known markers and BAC clones linked to the restorer gene [14]. Two regions, (1) 28,051,124 (067N04_SP6)-58,081,625 (89P04_T7) and (2) 169,655,088 (ORS1030)-173,581,392 (OP-H13), were identified as possible locations of the restorer gene Rf1 in the sunflower genome. This first seemed unusual as the regions are 114 Mb apart. However, the homology searches were performed against the sunflower reference genome of the maintainer line HanXRQ. Major recombination events involving large genomic regions may have occurred in the introduction of the restorer gene Rf1 that could perhaps explain two separate regions with homologies. However, false assemblies of contigs on chromosome 13 during the whole genome assembly cannot be excluded either. Whole genome sequencing of additional maintainer and restorer lines and comparison to the reference maintainer genome [15] are necessary and could reveal this. However, another fact might also point in this direction. Huge efforts have been undertaken to develop codominant markers for the Rf1 gene, but only the CAPS marker H13, which maps more than 7 cM away, could be developed. This could be also explained by larger rearrangements around the restorer gene locus.
Association mapping has proven to be a very useful tool in plant breeding to identify marker-trait associations [48]. In sunflowers, association studies have been successfully performed genome-wide for domestication traits [49], as well as based on candidate genes for branching [50], for flowering time [51,52], and for resistance against Sclerotinia sclerotiorum [53]. Here, we report about the first association study based on nine candidate genes for fertility restoration in sunflowers. In the two identified locations in the genome for Rf1, the 30-Mb-region and the 3.9-Mb-region, nine potential candidate genes could be identified. Six of them belonged to the class of pentatricopeptide repeat genes, which are involved in post-transcriptional processes in mitochondria and chloroplasts [18]. A number of restorer genes isolated from different species belong to this large family of PPR genes [16]. The cloned restorer genes are members of a subgroup within the P-type PPR gene family named Restorer of Fertility-Like (RFL) [54], which show "nomadic" characteristics and diversifying selection that allows these PPR genes to adapt fast to interact with newly occurring CMS-specific proteins [19,55]. In contrast, other PPR gene families (non-Rf) tend to be conserved [56]. Three other potential candidates were located in the 30-Mb-region and 3.9-Mb-region: a tetratricopeptide-like repeat helical domain gene, which is closely related to the PPR genes, a probable aldehyde dehydrogenase 22A1 (ALDH22A1), and a probable poly(A) polymerase 3 (PAPS3). Aldehyde dehydrogenases play a role in detoxification of aldehydes. In maize, the Rf2 gene restoring pollen fertility in the presence of the T-cytoplasm proved to be ALDH2B2 [29,30]. ALDH 22A1 (AT3G66658) has been well-characterized in Arabidopsis thaliana, where it showed a cytoplasmic localization and is apart from other tissues, also expressed in anthers [57]. The poly(A) polymerase 3 (PAPS3) also represents a very interesting candidate gene, as polyadenylation at the 3 end of RNAs influences the stability of transcripts. Of the four poly(A) polymerases described in Arabidopsis thaliana, PAPS3 (AT3G06560) represents a cytoplasmic PAPS, which lacks the C-terminus containing the putative nuclear localization domain [58]. PAPS3 is expressed in pollen, implicating an important role in development of male gametes.
Our association studies including nine candidate genes for Rf1 demonstrated that the three pentatricopeptide repeat genes PPR621 (HanXRQChr13g0419621, PP198), PPR841 (HanXRQChr 13g0418841), and PPR861 (HanXRQChr13g0418861), all located in the 3.9-Mb-region (169,655,088-173,581,392), are the most promising candidate genes for the restorer gene Rf1 in sunflowers. These three genes showed the highest numbers of SNPs significantly associated with fertility restoration (p < 1.81 × 10 −4 ). By looking for annotated PPR genes in high Fst-regions of whole sunflower genome sequence data, Owens et al. [59] also narrowed down the area of interest for the restorer gene Rf1 to 10 pentatricopeptide repeat genes. However, due the high stringency, no SNPs were available for the final tagging of the right gene. These 10 candidate genes included four genes (HanXRQChr13g0419621, HanXRQChr13g0419631, HanXRQChr13g0418841, HanXRQChr13g0418861) that we looked at, as well, supporting our choice of candidate genes. We used 120x genome coverage in the amplicon-targeted next generation sequencing, giving us very solid data for our calculation regarding the association of SNPs with fertility restoration. By this we were able to further reduce the number of candidate genes for Rf1 to three (PPR621-HanXRQChr13g0419621, PPR841-HanXRQChr13g0418841, and PPR861-HanXRQChr13g0418861). However, due to the high homologies to restorer sequences in the 30-Mb-region we cannot exclude that in restorer lines additional major recombination events in comparison to HanXRQ might have occurred, introducing other relevant genes near or into this 3.9-Mb-area.
Testing all 10 with fertility restoration associated SNPs, one of the SNPs at position 173,473,513 in PPR621 (HanXRQChr13g0419621) was successfully used to develop two SCAR markers (PPR621.5R and PPR621.5M) that allowed the identification of restorer and maintainer lines in separate PCR reactions. In the near future, this SNP can also be used to develop a codominant PAMSA marker or a Kompetitive Allele Specific Polymorphic (KASP) marker [60,61], which would be very interesting for studies on seed purity. Future work will also show the usefulness of the other nine SNPs associated with fertility restoration for marker development. A recent comparison of the three SNP-based marker systems, namely TaqMan, KASP, and rhAmp, might be helpful in selecting the most efficient marker system regarding cost, sensitivity, and reliability [62].
Versatility of newly developed markers in other restorer lines was the next interesting point to be addressed for plant breeding. All markers for the restorer gene Rf1 in sunflower had so far been developed on the base of segregation in a biparental population [9,11]. Differences that allowed the marker development could have been specific to the cross used. Therefore, it was necessary to test the versatility of these markers in a larger association panel. The newly developed SCAR markers PPR621.5R and PPR621.5M in this study were based on their occurrence in a broader association panel of 59 accessions and should already be more reliable. However, to test the versatility of the markers, we applied them in a large diverse association panel of 557 accessions. Most, but not all lines carrying CMS PET1, showed bands with the STS markers HRG01 and HRG02 for Rf1. One reason for this might be the presence of the fertility restorer gene Rf3 in the association panel, which had been first described in the confectionary line RHA 280 [63]. Rf3 also allows full fertility restoration in the presence of CMS PET1. Markers linked to Rf3, located on linkage group 7, have been identified [64] and could be used to verify this. In addition, Rf5, a new restorer gene also located on LG13, has been detected to be not allelic to Rf1 [65] and could be alternatively present in these lines. In addition, this might also be an explanation for the four restorer accessions (GN-0778, UGA-SAM1-181, UGA-SAM1-199, and UGA-SAM1-278) that did not have the expected C nucleotide at position 173,473,513.
The implementation of markers into breeding programs has been a challenge since the beginning, as researchers and plant breeders have divergent demands and laboratory equipment [46]. In our study, the three newly developed markers PAMSA 67N04_P, PPR621.5R, and PPR621.5M proved to be very valuable tools for marker-assisted selection, because PAMSA 67N04_P represents a co-dominant marker and PPR621.5R and PPR621.5M work in combination, even though they required separate PCR reactions, also allow the identification of heterozygous plants. Hopefully these markers will find their entrance into sunflower hybrid breeding programs as they are easy and reliable to handle in a large variety of sunflower breeding materials.

Plant Material for Association Mapping and Validation
The small association panel of 59 accessions for the NGS approach consisted of 45 accessions that are part of the UGA-SAM1 population (Supplementary Table S1 Table S3) was generated from the same two genetic resources. Field trials were performed near Schlanstedt, Germany, by the company Strube GmbH. Leaves were harvested and stored at −20 • C for DNA extraction. Genomic DNA of the 557 sunflower accessions was isolated according to Doyle and Doyle [67]. Concentration of the isolated DNA was adjusted to 100 ng/µL and was used for amplification with different markers (Supplementary Table S3).
STS markers, HRG01 and HRG02, were amplified as described by Horn et al. [9] for validation in the large association panel. HRG01 marker was used in a duplex PCR with an internal control coxII and annealing temperature set to 60 • C. HRG02 marker was used in a duplex PCR with an internal control atp9 and annealing temperature set to 65 • C. CAPS marker H13 and orfH522 were used as described by Kusterer et al. [11] and Reddemann and Horn [68], respectively. Amplification products were separated on ethidium bromide stained 2% agarose gel (Agarose NEEO ultra-quality Roth ® , Karlsruhe, Germany).

Cloning and Sequencing of Markers Linked to the Restorer Gene Rf1
RAPD-and AFLP-markers were cloned and sequenced as described by Horn et al. [9]. Sequences were obtained by Sanger sequencing using T7 and SP6 primers. The sequences were used to develop overgo probes (Supplementary Table S4) that could be radioactively labelled with 32 P-dATP and 32 P-dCTP to be used as probes for colony hybridization against two BAC libraries (https://www. ncbi.nlm.nih.gov/probe/docs/techovergo/, [14]). One BAC library had been prepared from high molecular weight DNA of the restorer line RHA 325 [42,69]. The second BAC library for the maintainer line HA 383 was obtained from the Clemson University Genomic Institute (CUGI), SC, USA. BAC ends of the positively identified BAC clones were sequenced using T7 and SP6 by CUGI. Larger BAC-ends were cloned by BamHI digests of the original HindIII BAC clones and ligation into the pUC18 vector [13]. These BAC-ends were also Sanger sequenced using SP6, but also applying additional internal primers [14].

Microsatellite Markers
The association panel of 59 accessions was analyzed by 34 SSR loci (Supplementary Table S5). The quality criteria for the SSR markers, also called microsatellite markers, were good amplification products, amplification of one locus, and high polymorphism within the association panel. Most of the microsatellites had been used before in association studies in sunflowers [70,71]. Additional ones were obtained from the publication of the sunflower reference map [44] in order to have two microsatellite markers for each of the 17 chromosomes in sunflower. SSRs were amplified by PCR and separated using the DNA Analyzer 4300 (LI-COR Biosciences, Lincoln, NE, USA), as described in Sajer et al. [72].

PAMSA Marker
Polymerase chain reaction (PCR) Amplification of Multiple Specific Alleles (PAMSA) [44] markers was developed based on a BAC (Bacterial Artificial Chromosome)-end sequence 67N04-B2 BamHI (RHA325) [13,14]. Briefly, three unlabeled primers were designed: two forward and a common reverse primer (Supplementary Table S5). Forward allele-specific primers contained an additional destabilizing mismatch within the first five bases of the 3 end. A mismatch was made by following pattern: A and T → C, G → A, and C → T [73]. In addition, one of the forward primers contained a tail at the 5 end. This was done to enable differentiation between the two alleles based on difference in length of the amplification product. Primers were designed using Primer3 (http://bioinfo.ut.ee/primer3-0.4.0/). PCR mixture of 15 µL was used for amplification with PAMSA marker. The mixture contained 2x PCR buffer, 0.2 µM dNTPs, 0.4 µM of each primer, 2.4 U of Taq DNA polymerase (New England BioLabs ® , Ipswich, MA, USA and 100 ng of genomic DNA. Amplification was performed in Applied Biosystems GeneAmp ® PCR System 2700 Thermal Cycler under following conditions: 3 min denaturation at 94 • C, followed by 40 cycles of 1 min denaturation at 94 • C, 1 min annealing at 65 • C, 30 s polymerization at 72 • C; followed by a 7 min period of elongation at 72 • C. The amplification products were separated on ethidium bromide stained 3% agarose gel (Agarose NEEO ultra-quality Roth ® , Karlsruhe, Germany) at 120 V for 60 min.

SCAR Markers
SNP-based markers were developed for the detection of the SNP G to C at position 173,473,513 of the sunflower genome sequence (https://www.heliagene.org/HanXRQ-SUNRISE/, [15]). This SNP was found in the sequence of the PPR621, a pentatricopeptide repeat candidate gene. Created SNP-based markers, PPR621.5M and PPR621.5R, consist of allele-specific primers, two forward and one common reverse primer (Supplementary Table S6). In addition, a second SNP G to A was present in the forward primer sequence, which was detected at the position 173,473,493 of the sunflower genome (Supplementary Table S2). Primers were designed by use of program Primer3 (http://bioinfo. ut.ee/primer3-0.4.0/). PCR mixture (15 µL) used for amplification was the same as with the PAMSA marker (except that in this reaction only two primers were added to the mixture). Amplification was performed in Applied Biosystems GeneAmp ® PCR System 2700 Thermal Cycler under the following conditions: 2 min denaturation at 94 • C, followed by 30 cycles; 30 sec denaturation at 94 • C, 30 sec annealing at 62 • C, 1 min polymerization at 72 • C; followed by a final 4 min period of elongation at 72 • C. The amplification products were separated on ethidium bromide stained 2% agarose gels (Agarose NEEO ultra-quality Roth ® , Karlsruhe, Germany) at 120 V for 60 min.

Population Structure
Population structure in the sunflower association panel of 59 accessions was analyzed based on microsatellite data (34 SSR primer combinations) by use of the admixture model implemented in STRUCTURE V.2.3.4. [74]. Ten independent replications were performed for the analysis of the mean k-value (an assumed fixed number of subpopulations) from 1 to 10. STRUCTURE was set to 50,000 as burn-in time, followed by 100,000 Markov Chain Monte Carlo (MCMC) iterations [51]. The optimal number of subpopulations was determined by use of STRUCTURE HARVESTER [75] applying the Evanno method [76]. Alignment of replicates obtained as the result of the STRUCTURE analysis for a determined number of assumed subpopulations was performed in CLUMPP [77].

Amplicon Sequencing
DNA extraction, amplicon targeted sequencing, and SNP/INDEL analyses were performed as contract work by LGC Genomics, Berlin, Germany. DNA from sunflower leaves of 59 accessions (Supplemental Table S1) was extracted according to a house-intern protocol. Amplicon-targeted next generation sequencing, including primer designs and library preparations for nine potential candidate genes for the restorer gene Rf1 (Table 2), was performed using the Ovation Custom Target Enrichment System (NuGen Technologies, Tecan Group Ltd., Männedorf, Switzerland). Sequencing was done on a MiSeq V3 subunit (2 × 300 bp) with 120× coverage. Afterwards, all library groups were first demultiplexed using Illumina bcl2fastq 2.17.1.14 software. Here, 1 or 2 mismatches or Ns were allowed in the barcode read when the distances between the barcodes allowed for it. Adapter remnants were clipped from all reads. Read lengths < 100 bases were discarded. A quality trimming of the adapter clipped Illumina reads followed. This included trimming of reads at the 3 -end to get a minimum average Phred quality score > 20 over a window of 10 bases. Reads with final length < 20 bases were discarded. Quality reports were generated for all FASTQ files by the FastQC software. In addition, a read_counts.xlsx file containing all read counts for all samples was generated. Read counts for the SNPs in the nine genes of the 59 genotypes by NGS are shown in Supplementary Table S7.

Association Studies
Marker-trait associations were made by jointly analyzing genotypic and phenotypic data. Association studies were performed with the software package TASSEL V5.0 [78] by use of the General Linear Model (GLM). Minor allele frequency (MAF) > 0.05 was used to filter SNPs prior to analysis, while Q was extracted from the results of the previous population structure analysis (CLUMPP). Bonferroni correction was calculated by dividing 0.05 with the number of total SNPs detected. It was applied for defining a significance threshold of p ≤ 1.81 × 10 −4 for association between detected SNPs and fertility restoration. The Manhattan plot was generated in Excel based on p-values obtained from TASSEL. For the Manhattan plot, the negative logarithm of the p-values [−log 10 (P)] from the association studies were plotted against the genomic locations of the SNPs.

Conclusions
The publication of the reference sunflower genome [15] represents a milestone in sunflower breeding, as it has opened up new and fast possibilities to develop markers for breeding purposes, which we here demonstrated for the development of markers for the restorer gene Rf1. After identification of two potential regions for the localization of the restorer gene Rf1, potential candidate genes could be addressed in the annotated genome and used for amplicon-targeted next generation sequencing. Association studies using the identified SNPs detected significant associations with three PPR genes (HanXRQChr13g0419621, HanXRQChr13g0418841, and HanXRQChr13g0418861). One of these might encode the Rf1 gene. The SNPs now represent a valuable tool for the development of further markers linked to the restorer gene Rf1, which we proved to be possible by successfully designing and testing the new SCAR-markers PPR621.5R and PPR621.5M, specific for restorer or maintainer lines, respectively.
For a broad application of markers linked to the restorer gene Rf1, it is necessary that these markers are not restricted to the biparental population they have been developed in, but are applicable in a wide spectrum of lines. Markers developed based on association, per se, will have a broader possibility of applications. We here successfully tested formerly published markers linked to the restorer gene Rf1, as well as three newly developed markers in a diverse association panel of 557 accessions, and could observe wide versatility in these markers. Especially, the three newly developed markers (co-dominant PAMSA 67N04_P and SCAR markers PPR621.5R and PPR621.5M) represent valuable tools for marker-assisted selection in sunflower hybrid breeding. Author Contributions: R.H. cloned and sequenced markers linked to the restorer gene Rf1, selected the nine candidate genes, ordered the amplicon targeted sequencing, worked on the SNP analyses, participated in the association mapping as well as population structure analyses, general planning of the project, and participation in writing the manuscript and the marker development. A.R. isolated DNA from the large association panel, performed the population structure as well as the association mapping, developed the PAMSA marker based on sequences made available to her, as well as the SCAR marker. L.F. performed the BLAST against the