Genome-Wide Association Studies for Sex Determination and Cross-Compatibility in Water Yam (Dioscorea alata L.)

Yam (Dioscorea spp.) species are predominantly dioecious, with male and female flowers borne on separate individuals. Cross-pollination is, therefore, essential for gene flow among and within yam species to achieve breeding objectives. Understanding genetic mechanisms underlying sex determination and cross-compatibility is crucial for planning a successful hybridization program. This study used the genome-wide association study (GWAS) approach for identifying genomic regions linked to sex and cross-compatibility in water yam (Dioscorea alata L.). We identified 54 markers linked to flower sex determination, among which 53 markers were on chromosome 6 and one on chromosome 11. Our result ascertained that D. alata is characterized by the male heterogametic sex determination system (XX/XY). The cross-compatibility indices, average crossability rate (ACR) and percentage high crossability (PHC), were controlled by loci on chromosomes 1, 6 and 17. Of the significant loci, SNPs located on chromosomes 1 and 17 were the most promising for ACR and PHC, respectively, and should be validated for use in D. alata hybridization activities to predict cross-compatibility success. A total of 61 putative gene/protein families with direct or indirect influence on plant reproduction were annotated in chromosomic regions controlling the target traits. This study provides valuable insights into the genetic control of D. alata sexual reproduction. It opens an avenue for developing genomic tools for predicting hybridization success in water yam breeding programs.


Introduction
Yam (Dioscorea spp.) is an important food and cash crop in tropical and subtropical areas [1]. It is extensively produced (~93% of world production) in the African yam belt, a six-country region from Cameroon to Côte d'Ivoire, where it plays significant economic, sociocultural, and religious roles among ethnic groups [2]. Dioscorea alata, commonly referred to as water, winged or greater yam, is the most widely distributed and the secondmost-produced yam species after D. rotundata worldwide [3]. The popularity of D. alata stems from its high yield potential (even under low soil fertility), ease of propagation, competition with weeds (early vigor) and tuber storability [4,5]. Yam yield has, however, remained low over time because of several biotic (diseases and pests), abiotic (drought, low soil fertility, etc.), and agronomic constraints [6,7]. Developing resistant/tolerant varieties coupled with a robust seed delivery system could be an effective means of raising yields of predominantly resource-poor farmers characterized by low use of external farm inputs. The variety development process requires a thorough understanding of the crop's reproductive mechanisms.

Chromosomic Regions Linked to D. alata Sex Determination and Cross-Compatibility
The GWAS scan identified 54 SNP markers associated with variation for flower sex; 53 of these markers were located on chromosome 6 while one was on chromosome 11 (Table 2, Figure 1a). Of the total SNP markers associated with plant sex, the minor allele frequencies (MAF) ranged from 0.13 (Chr6_837364 and Chr6_843525) to 0.43 (Chr6_3465 and Chr6_53812). The total phenotypic variance explained (PVE) by inventoried SNP markers was high (49-86%). The marker effects ranged from −1.92 to 1.77. The logarithm of odd (LOD)-scores varied from 4.47 to 9.69 for sex markers (Table 2).  Three SNP markers distributed on three chromosomes ( Figure 2a, Table 2) were identified as responsible for the genotype's average crossability rate (ACR). Chr6_3161 is located at 3 kilo-base pairs (kbp) on chromosome 6 while the SNP Chr1_215056 on chromosome 1 is located at 21 kbp and Chr17_9492 on chromosome 17 at 9 kbp. PVE ranging from 32 to 35% was observed, with minor allele frequencies of 0.25-0.35, and the marker effects were from -20.47 to 14.02 (Table 2). Two markers were found for the percentage high crossability (PHC) on chromosomes 1 and 6 ( Figure 3a). The marker Chr1_215056 was from chromosome 1, at the physical position of 215 kbp, it explained 29% of the phenotypic variance, had a marker effect of -43.11 and a LOD-score of 4.01. This marker's MAF was 0.03. On the other hand, the marker Chr6_3227 was retrieved at the position 3 kbp on chromosome 6. Its MAF was Three SNP markers distributed on three chromosomes ( Figure 2a, Table 2) were identified as responsible for the genotype's average crossability rate (ACR). Chr6_3161 is located at 3 kilo-base pairs (kbp) on chromosome 6 while the SNP Chr1_215056 on chromosome 1 is located at 21 kbp and Chr17_9492 on chromosome 17 at 9 kbp. PVE ranging from 32 to 35% was observed, with minor allele frequencies of 0.25-0.35, and the marker effects were from −20.47 to 14.02 (Table 2).  Three SNP markers distributed on three chromosomes ( Figure 2a, Table 2) were identified as responsible for the genotype's average crossability rate (ACR). Chr6_3161 is located at 3 kilo-base pairs (kbp) on chromosome 6 while the SNP Chr1_215056 on chromosome 1 is located at 21 kbp and Chr17_9492 on chromosome 17 at 9 kbp. PVE ranging from 32 to 35% was observed, with minor allele frequencies of 0.25-0.35, and the marker effects were from -20.47 to 14.02 (Table 2). Two markers were found for the percentage high crossability (PHC) on chromosomes 1 and 6 ( Figure 3a). The marker Chr1_215056 was from chromosome 1, at the physical position of 215 kbp, it explained 29% of the phenotypic variance, had a marker effect of -43.11 and a LOD-score of 4.01. This marker's MAF was 0.03. On the other hand, the marker Chr6_3227 was retrieved at the position 3 kbp on chromosome 6. Its MAF was Two markers were found for the percentage high crossability (PHC) on chromosomes 1 and 6 ( Figure 3a). The marker Chr1_215056 was from chromosome 1, at the physical position of 215 kbp, it explained 29% of the phenotypic variance, had a marker effect of −43.11 and a LOD-score of 4.01. This marker's MAF was 0.03. On the other hand, the marker Chr6_3227 was retrieved at the position 3 kbp on chromosome 6. Its MAF was 0.26, and it explained 29% of the phenotypic variance. The marker effect and LOD-score were −27.36 and 4.04, respectively (Table 2). 0.26, and it explained 29% of the phenotypic variance. The marker effect and LOD-score were -27.36 and 4.04, respectively (Table 2).

Analysis of the Sex Determination System
The haplotype view of markers associated with plant sex in female and male plants of D. alata showed that the sex is controlled by the male parent (XY) since the females were 95.9% homozygous (XX) for markers linked to sex determination (Figure 4, Supplementary Table S1). In contrast, markers linked with plant sex displayed 84.96% heterozygosity in the male genotype population ( Figure 5, Supplementary Table S2).  The quantile-quantile (Q-Q) plots generated by plotting the negative logarithms (−log10) of the p-values against their expected p-values showed appropriateness of the GWAS model for all the three traits. There was an inflection between observed and expected values for target traits, thus supporting association between the phenotype and markers (Figures 1b, 2b and 3b).

Analysis of the Sex Determination System
The haplotype view of markers associated with plant sex in female and male plants of D. alata showed that the sex is controlled by the male parent (XY) since the females were 95.9% homozygous (XX) for markers linked to sex determination ( Figure 4, Supplementary  Table S1). In contrast, markers linked with plant sex displayed 84.96% heterozygosity in the male genotype population ( Figure 5, Supplementary Table S2). 0.26, and it explained 29% of the phenotypic variance. The marker effect and LOD-score were -27.36 and 4.04, respectively (

Analysis of the Sex Determination System
The haplotype view of markers associated with plant sex in female and male plants of D. alata showed that the sex is controlled by the male parent (XY) since the females were 95.9% homozygous (XX) for markers linked to sex determination (Figure 4, Supplementary Table S1). In contrast, markers linked with plant sex displayed 84.96% heterozygosity in the male genotype population ( Figure 5, Supplementary Table S2).

Haplotype Segregation for ACR and PHC
The haplotype segregation showed that among the three markers identified as associated with ACR, Chr17_9492 was the most promising in discriminating genotypes for pollination success (p < 0.05). Of the three variants (TT, CT and CC), the variant CC was associated with low ACR (Figure 6, Table 3). On the other hand, CT and TT were identified as predictors of genotypes with high ACR. The other two SNP markers for ACR showed no significant effects among the different variants. The marker Chr1_215056 allowed discrimination for the PHC: the haplotype AA was associated with high PHC, while the haplotype AG controlled low PHC (Figure 7, Table 3).

Haplotype Segregation for ACR and PHC
The haplotype segregation showed that among the three markers identified as associated with ACR, Chr17_9492 was the most promising in discriminating genotypes for pollination success (p < 0.05). Of the three variants (TT, CT and CC), the variant CC was associated with low ACR (Figure 6, Table 3). On the other hand, CT and TT were identified as predictors of genotypes with high ACR. The other two SNP markers for ACR showed no significant effects among the different variants. The marker Chr1_215056 allowed discrimination for the PHC: the haplotype AA was associated with high PHC, while the haplotype AG controlled low PHC (Figure 7, Table 3).

Putative Gene Annotation Linked to Flower Sex and Cross-Pollination
Inventoried gene or protein families with any association to plant flowering and reproduction are presented in Supplementary Table S3. We identified four gene/protein families in chromosomic regions associated with ACR: Homeobox domain, Helix-turn-helix motif, NAC domain and Zinc finger CCHC-type protein. Twelve gene families previously reported for their involvement in plant reproduction in other crops were found in chromo-somic regions associated with the PHC. Of these, WD40 repeat G-protein, ubiquitin-protein ligase SINA like, Seven-in-absentia protein (TRAF-like domain), P-loop containing nucleoside triphosphate hydrolase and Proteasome component (PCI) domain are the most promising candidates. On the other hand, we identified 45 different gene/protein families with links to plant reproduction in chromosomic regions controlling plant sex determination. Among them, the most promising candidates were Zinc finger (RING/FYVE/PHD-type), Glycosyltransferase AER61, NAD(P)H-quinone oxidoreductase subunit L, GLABROUS1 enhancer-binding protein (GeBP) and GeBP-like proteins, Auxin efflux carrier, Ribosomal protein PSRP-3/Ycf65, MADS-box, RNA recognition motif domain, Basic-leucine zipper domain, Myb/SANT-like domain, Aldolase-type TIM barrel, NAD(P)-binding domain, Zinc finger (Rad18-type putative), Tify, ABC transporter-like, Homeodomain-like, Prohibitin, Initiation factor eIF-4 gamma (MA3) and HD-ZIP protein (N-terminal).

Genomic Regions Controlling Sex Determination, ACR and PHC Are on the Same Chromosomes
An accurate method for early detection of seedling sex and compatible fertile parents prior to designing crosses would be helpful to improve cross-pollination success in yam breeding. We used GWAS method to investigate genomic regions controlling sex determination, ACR and PHC in D. alata. A total of 54 markers were identified for sex determination, 53 of these were mapped on chromosome 6 and one on chromosome 11. Besides, the gene annotation showed many gene/protein families previously involved in flowering and reproduction in other crop species on those chromosomes, especially chromosome 6. Our findings agree with previous sex determination and flowering behavior studies on D. alata. Previous reports demonstrated an involvement of chromosome 6, and secondary chromosomes 1 and 11, on D. alata flowering and sexual reproduction [3,5]. As hypothesized by Denadi et al. [20], Dioscorea spp. flowering and sex might be controlled by several genes as our results also supported. For instance, we have identified up to 54 SNP markers on chromosomes 6 and 11 for the flower sex determination. This implies that additional SNP markers should be developed for sex identification and deployed in yam breeding programs to complement efforts by Cormier et al. [5].
This study investigated for the first time the association between chromosomic regions and successful pollination rates (represented in this work by two indices: ACR and PHC) in yam. The GWAS output showed that the ACR was controlled by chromosomes 1, 6 and 17, while the PHC was associated with genes on chromosomes 1 and 6. We can, thus, conclude that chromosomes 1 and 6 are the major contributors to D. alata flower sex and cross-pollination success. Being controlled by the same chromosomes, there might be a probable correlation between flowering ability, sex determination and pollination success in water yam. If the correlation is confirmed, it would be possible to select for these traits simultaneously. Although not associating flower sex with chromosome 1, Cormier et al. [3] showed that this chromosome is involved in D. alata flowering ability. Other studies also found a relationship between genes controlling reproduction traits such as flowering time, behavior and sex in plants [21].
Of the identified markers, Chr17_9492 and Chr1_215056 were the most promising for ACR and PHC predictions, respectively. This work, therefore, opens an avenue for improving hybridization practices in water yam by providing molecular markers for sex determination (crucial for effective hybridization plans in dioecious plants) and the crossability potential of parents to be involved in breeding programs.

Dioscorea alata Flower Sex Is Controlled by a Male Heterogametic System
The haplotype analysis showed that sex in D. alata is determined by the male parent. The females were at~96% homozygous for markers linked to sex determination while males were~85% heterozygous for this trait. Our results, using GWAS approach, Diversity Arrays Technology (DArT) and a core collection from West Africa, supported previous reports on D. alata sex determination which showed that this species is characterized by a male heterogametic sex determination system [3,5]. In such a system, XX determines female sex phenotype and XY the male sex phenotype [3,5]. It is noteworthy that previous studies on D. alata sex determination used either bi-parental populations [5] or GWAS with genotyping-by sequencing (GBS) and a core collection from the French West Indies (Guadeloupe) in Latin America [3]. Besides, Cormier et al. [3] aligned raw sequencing reads on the D. rotundata reference genome v1 [9] to detect SNPs, while our study used the newly released D. alata reference genome [29]. Using a different approach and plant material, our study is, therefore, strengthening conclusions on D. alata sex determination as reported by previous studies. Efforts should be concentrated on the SNP markers which displayed 100% homozygosity for female genotypes, while markers with 100% heterozygosity record should be selected for future studies, such as marker conversion into KASP-PCR, validation and deployment in the breeding program for marker-assisted selection.
As also reported by Cormier et al. [3], we observed a certain level of mismatch between the genetic information and the sex phenotype of some D. alata cultivars, such that a genotype with male haplotypes could display a female phenotype and vice versa. These authors hypothesized that the ploidy level could possibly explain that mismatch. They argued that the polyploidy leads to major changes in gene regulation and expression, as also supported by Chen [30]. Therefore, efforts are necessary to elucidate the extent of the influence of ploidy level on flower sex prediction in D. alata.
The presence of gene/protein families regulating hormones such as gibberellins, auxins, ethylene and cytokinins in the sex-determining regions (Supplementary Table S3) could confirm the crucial roles played by these phytohormones for sex determination in dioecious and monoecious plants. Generally, auxins and ethylene have feminizing effects, whereas cytokinins and gibberellins have masculinizing effects [15,21,[31][32][33]. A better understanding of the hormone balance for a sex phenotype display could facilitate manipulation of flowering behaviors and sex ratios in D. alata.
It is noteworthy that the equilibrium sex ratio of 1:1 expected from the Fisherian theory is seldom respected in Dioscorea species as there is a significant male bias [8,11,34]. The frequent occurrence of male-biased sex ratios in the plant has been associated with three factors: (i) greater female than male reproductive expenditure, (ii) greater sensitivity of females to stress and (iii) spatial segregation of the sexes as a result of resource gradients. Thus, the high reproductive investment required makes females more sensitive to internal (genetic) and exogenous (ecological) conditions affecting plant reproductive activities [12,19,35].

Gene Annotation Showed the Presence of Gene/Protein Families with Links to Plant Sex and Cross-Pollination
We identified a total of 61 gene/protein families with links to plant flowering and reproduction. Their specific functions, crops in which they were reported and corresponding references are provided in Supplementary Table S3.
Further investigations are, thus, necessary to determine which of these candidate gene/protein families are directly involved in yam sex determination and crosscompatibility. This gene profiling will be useful in identifying candidate genes that can be targeted for further validation in the attempt to control flower sex and cross-pollination in yam breeding programs.

Plant Materials and Breeding Sites
Plant materials used for GWAS analyses consisted of 74 D. alata clones (33 females and 41 males) involved in hybridization activities at the International Institute of Tropical Agriculture (IITA), Nigeria, from 2010-2020. These clones included breeding lines and local landraces (Table 1). It is noteworthy that these clones were part of the 100 water yam genotypes sequenced and presented in Gatarira et al. [27] and Agre et al. [28]. The IITA yam crossing blocks in Nigeria are established at Ibadan (7 • 29 N and 3 • 54 E) and Abuja (9 • 10 N and 7 • 21 E). In Nigeria, yam fields are planted in April-May, and the harvest occurs in December-January. Soil and weather conditions at these breeding sites are presented in Supplementary Table S4.

Flower Sex Phenotyping
Flower sex phenotype was assessed visually at flowering. Dioscorea alata male and female flowers differ morphologically in shape and size, female flowers being larger than male counterparts (Figure 8). Historical data, collected from IITA crossing blocks from 2010-2020, allowed identifying yam clones' sex phenotypes. The sex phenotype was scored as: 1 = non-flowering, 2 = male, 3 = female as described in the yam crop ontology [66]. Although D. alata is strictly dioecious (no monoecious cases reported), it experiences irregular/erratic flowering like other yam species, such that a genotype may flower or not in a particular year [8,11]. For convenient analyses, we only focused on genotypes with stable/regular flowering over the considered period. The sex information of genotypes used in this study is presented in Table 1.

Genotypes' ACR and PHC Assessment
Calculation procedures for ACR, crossability rate and PHC were adopted from Mondo et al. [22]. The average crossability rate (ACR) was calculated using 2010-2020 historical data from IITA yam crossing blocks at Ibadan and Abuja stations, Nigeria. The ACR consisted of dividing the sum of means of a genotype's crossability rates by the number of cross-combinations in which the genotype was involved from 2010-2020: ACR = ∑ Crossability rates Number of cross combinations (1) In Equation (1) Percentage high crossability (PHC) for a parent was calculated as the number of times the crossability rate exceeded the species overall cross-compatibility, divided by the number of cross-combinations in which that parental genotype was involved: Number of crossability rates > overall species mean Number of cross combinations × 100 Based on a previous report, the overall mean crossability rate for D. alata is 31.7% [22]. The pollination information (ACR and PHC) of genotypes used in this study is presented in Table 1. This information was summarized using a cross-tabulation function implemented in Microsoft Excel. Flower sex phenotype was assessed visually at flowering. Dioscorea alata male and female flowers differ morphologically in shape and size, female flowers being larger than male counterparts (Figure 8). Historical data, collected from IITA crossing blocks from 2010-2020, allowed identifying yam clones' sex phenotypes. The sex phenotype was scored as: 1 = non-flowering, 2 = male, 3 = female as described in the yam crop ontology [66]. Although D. alata is strictly dioecious (no monoecious cases reported), it experiences irregular/erratic flowering like other yam species, such that a genotype may flower or not in a particular year [8,11]. For convenient analyses, we only focused on genotypes with stable/regular flowering over the considered period. The sex information of genotypes used in this study is presented in Table 1.

Genotypes' ACR and PHC Assessment
Calculation procedures for ACR, crossability rate and PHC were adopted from Mondo et al. [22]. The average crossability rate (ACR) was calculated using 2010-2020 historical data from IITA yam crossing blocks at Ibadan and Abuja stations, Nigeria. The ACR consisted of dividing the sum of means of a genotype's crossability rates by the number of cross-combinations in which the genotype was involved from 2010-2020:

DNA Extraction, Library Construction and SNP Calling
For each genotype, we collected about one gram of fresh, healthy and young leaves from a field-grown plant and immediately placed the sample on dry ice. The leaf samples were then lyophilized and kept at ambient room temperature (~25 • C). Deoxyribonucleic acid (DNA) was extracted from lyophilized leaf samples using the cetyltrimethylammonium bromide (CTAB) protocol [67] with slight changes. The DNA quality was assessed on 0.8% agarose gel and concentration was estimated using nanodrop (Amersham Bioscience, Piscataway, NJ, USA) following the manufacturer's directives. Subsequently, 50 µL of 50 ng/µL diluted DNA of each genotype was prepared and sent to Diversity Arrays Technology (DArT) Pty Ltd., Australia, for sequencing-based DArT genotyping using the DArT marker procedure described by Agre et al. [28]. Complexity reduction methods optimized for yam at DArT were used: PstI_ad/TaqI/HpaII_ad with TaqI restriction enzyme used to eliminate a subset of PstI-HpaII. PstI-site specific adapter was tagged with 102 different barcodes enabling encoding a plate of DNA samples to run within a single lane on an Illumina GAIIx. After the sequencing, FASTQ files generated by DArT were aligned against the newly released D. alata genome reference [29]. Variants (SNP markers) were called using the DArT's proprietary software, DArTSoft, as previously described [26] and a single row format was generated. Finally, hapmap and VCF files were developed from the single row format and used for the final analysis.

Genotypic Data Analysis
Multiple sequences were generated by the DArTSeq platform using proprietary analytical pipelines (Diversity Array Technology, Canberra, Australia) and mapped to the D. alata v2 reference genome [29]. This produced a raw dataset (single row format) of 22,140 SNPs that were subjected to SNP markers filtering with the following criteria: markers with low sequence depth < 5; SNP markers with missing values > 20%; minor allele frequency (MAF) < 0.05; genotype quality < 20; and heterozygosity > 50. This quality control filtering resulted in 9687 good-quality SNPs distributed across the 20 chromosomes [27].

GWAS Analysis and Identification of Putative Genes
A compressed mixed linear model (CMLM) implemented in the GAPIT R package was used to compute associations using the mixed model y = Xb + Zu + e [68], where y is the vector of the phenotypic observations estimated for the ACR and the PHC, X represents the SNP markers (fixed effect), Z is the random kinship (co-ancestry) matrix, b is a vector representing the estimated SNP effects, u is a vector representing random additive genetic effects, and e is the vector for random residual errors.
A co-ancestry matrix from principal component representing the possible diversity subgroup and kinship was included as covariates in the GWAS model to account for population structure and familial relatedness, respectively, to reduce spurious associations. The Manhattan plot was also generated in R/CMplot to visualize GWAS results over the entire genome [69] using the GWAS output from GAPIT. The phenotypic variation explained by the model for a trait and a particular SNP was determined using stepwise regression implemented in lme4 R package. The SNP loci with significant association with the traits were determined by adjusted p-value using Bonferroni correction [70].
Quantile-quantile (QQ) plots were generated by plotting the negative logarithms (−log10) of the p-values against their expected p-values to fit the appropriateness of the GWAS model with the null hypothesis of no association and to determine how well the models accounted for the population structure.
To inventory potential putative genes in the vicinity of associated SNP markers for target traits, we defined a window range of 1 Mb (upstream and downstream) and genes were searched from D. alata generic feature format (GFF3) of the reference genome. Public database Interpro, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) allowed us to determine the functions of the genes associated with the different SNPs identified. A Google Scholar search allowed us to obtain more information on already known identified gene or protein families. For the sex determination, two sets of genotypes were developed, male and female, and the hapmap file of associated SNP markers was developed and viewed in Tassel 5. Proportion of heterozygosity and homozygosity level was estimated across the male and female genotypes.

Conclusions
This study showed the potential of the GWAS in identifying chromosomal regions associated with sexual reproduction in D. alata. There is a probable association between the sex determination, ACR and PHC, since they are all controlled by the same chromosomes. Haplotype analysis confirmed the male heterogametic sex determination system for D. alata. This species' reproduction traits could be controlled by multiple genes. We identified promising SNP markers for sex determination, ACR and PHC, which could be used in marker-assisted selection in yam breeding.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/plants10071412/s1, Table S1: Haplotype view of markers associated with plant sex in female plant of D. alata, Table S2: Haplotype view of markers associated with plant sex in male plant of D. alata, Table S3: Candidate gene/protein families annotated in regions controlling target traits, Table S4: Weather and soil variables of the IITA yam breeding sites.