Next Article in Journal
New Insights into Acinetobacter baumannii Pathogenesis and Therapeutic Implications
Previous Article in Journal
Kidney Transplant Recipients: Viral Infections and Malignancies
Previous Article in Special Issue
HTD1265 Disrupts GimC-Dependent Cellular Processes in Saccharomyces cerevisiae
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genomic Analysis of Resistance to Exserohilum turcicum in Nigerien and Senegalese Sorghum Using GWAS and Machine Learning

1
USDA-ARS, Southern Plains Agricultural Research Center, 2765 F & B Road, College Station, TX 77845, USA
2
USDA-ARS Sustainable Perennial Crops Laboratory, Beltsville Agricultural Research Center, Beltsville, MD 20705, USA
3
Department of Plant Pathology, University of Minnesota, St. Paul, MN 55108, USA
4
Department of Plant Pathology and Microbiology, Texas A&M University, College Station, TX 77843, USA
*
Author to whom correspondence should be addressed.
Pathogens 2026, 15(4), 389; https://doi.org/10.3390/pathogens15040389
Submission received: 24 February 2026 / Revised: 24 March 2026 / Accepted: 31 March 2026 / Published: 5 April 2026
(This article belongs to the Special Issue Emerging and Rare Fungal Pathogens in a Changing World)

Abstract

Sorghum, an essential crop in Niger, ranks second to pearl millet in importance for food, feed, and commerce. However, its yields are hindered by various factors, including diseases such as leaf blight caused by Exserohilum turcicum. In this study, field phenotypes were analyzed on 102 accessions (including checks SC748-5 and BTx623) grown and evaluated at two locations in Niger for leaf blight incidence and severity. The panel included accessions originally collected from Niger and Senegal. Genotypes were generated for 120 accessions, and GWAS/ML analyses were performed on 102 accessions due to missing phenotypic data. Among the accessions, S39, N23, and N38 exhibited mean leaf blight incidence below 50%, while S3, S43, N23, and N38 displayed the lowest severity levels, with a mean severity in Niger of 24.5 ± 0.64. Accession N23 showed relatively low incidence and severity levels across the Niger field evaluations. Using genome-wide association studies and machine learning, candidate SNPs associated with leaf blight phenotypes were identified. Genes near these SNPs were associated with functions related to plant defense mechanisms and stress responses, providing preliminary targets for future validation in sorghum leaf blight studies.

1. Introduction

Leaf blight on sorghum incited by Exserohilum turcicum (Pass.) K. J. Leonard & E. G. Suggs [syn. Helminthosporium turcicum (Pass.)] also occurs on maize, sudangrass, johnsongrass, teosinte, and gamagrass [1]. However, two host specificities have been described: E. turcicum f. sp. zeae infects maize, and the one on sorghum, E. turcicum f. sp. sorghi [2,3]. A single gene, SorA+ in sorghum and ZeaA+ in maize, is attributed to the host-specificity by the pathogen [3]. The sorghum leaf blight infects both young and older plants. Leaf blight symptoms ranged from large elongated to spindle-shaped spots with yellowish to gray centers with reddish margins [1,4]. Sorghum leaf blight is widely distributed in areas where sorghum is planted and under severe foliar infection, losses of up to 70% can occur [1,5,6,7]. Prom et al. [7,8] observed several prevalent diseases, including leaf blight in both Niger and Senegal. In Niger and other African countries, prevalence of leaf blight in production fields can range from 12% to 100% [5,7]. Surveys of foliar and panicle sorghum diseases during the 2019 and 2022 growing seasons in Niger and Senegal revealed 89% and 100% prevalence of leaf blight, respectively [7,8]. Meanwhile, the incidence of leaf blight among seven regions in Senegal surveyed for sorghum diseases ranged from 22% in Diourbel to 81% in Kolda [9]. Beshir et al. [5] recorded 40 to 100% leaf blight incidence on landraces planted in central Sudan. A survey conducted in Kenya by Ogolla et al. [10] documented leaf blight incidence ranging from 12% to 74% on sorghum planted in different regions.
Resistant sources of the sorghum leaf blight have been reported [5,6,11]. Hepperly and Sotomayor-Ríos [11] reported that a single dominant gene conferred resistance of SC0326 to leaf blight. Beshir et al. [12] evaluated progenies from a cross of MUC007/009 (resistant parent) × Epuripuri (susceptible parent) and noted that the resistance to leaf blight was quantitative.
Recently, molecular tools such as genome-wide association studies (GWAS) have been utilized to identify SNPs and locate genes associated with economically important traits such as disease resistance. In addition to evaluating the Nigerien and Senegalese sorghum accessions for incidence and severity of leaf blight in two locations in Niger (Bengou and Maradi), this work also reports the results of GWAS and a CatBoost-based machine learning (ML) approach to identify candidate SNP markers associated with leaf blight incidence and severity. CatBoost was selected due to its robustness in high-dimensional settings, built-in regularization, and ability to efficiently handle mixed feature representations without extensive preprocessing [13].

2. Materials and Methods

Study area: The research fields were established in Niger, West Africa, during the 2022 growing season. Niger lies at 17°35′48.77″ N latitude and 8°04′58.26″ E longitude [14]. In Niger, accessions were planted in two locations: (1) Bengou, in the Southern Dosso region 13°02′56.40″ N and 03°11′37.32″ E with a Sahelian and a Sahelo-soudanian climate, and (2) Maradi 13°30′0.00″ N and 07°06′06.26″ E, in Maradi region, with a Sahelian climate [15]. The soil type in the Maradi region is ferruginous, while the soil type in the Dosso region is hydromorphic [16]. A total of 102 accessions from Niger and Senegal, including checks SC748-5 and BTx623, were planted at each location. Seeds of each accession were planted in 1.8 m rows with 0.8 m row spacing at each location. Accessions were planted in a randomized complete block design, and each accession was replicated three times. Fields were kept free of weeds with occasional hand hoeing. Plants were evaluated for leaf blight incidence in the soft to early hard dough stage of development. The leaf blight incidence was based on the formula noted below.
Incidence = Number   of   plants   with   the   disease   in   a   row Number   of   plants   assessed   in   a   row .   × 100
Disease Severity Scale: The severity scale was previously described by Prom et al. [7,17] and based on 0–11 with mid-points, where 1 = 5.5, 2 = 15.5, 3 = 25.5, 4 = 35.5, 5 = 45.5, 6 = 55.5, 7 = 65.5, 8 = 75.5, 9 = 85.5, 10 = 95.5, and 11 = 100 used to calculate the mean severity.

2.1. GWAS

DNA extraction for 120 sorghum accessions (60 Niger, 60 Senegal) was performed; downstream association and ML analyses used the subset with matched field phenotypes (n = 102). In brief, it was performed using either NucleoSpin Plant II kits (Macherey-Nagel, Düren, Germany, ref. 740770) or a modified CTAB protocol following the established methods of Prom et al. [17], Kale et al. [18], and Doyle and Doyle [19]. The DNA was purified using 7.5 M ammonium acetate and isopropanol. OD ratios (260/230, 260/280) and examined with SpectraMax® QuickDrop™ Micro Volume spectrophotometer (Molecular Devices, San Jose, CA, USA). Then, the samples were run on a 1% agarose gel stained with ethidium bromide for quality control. The sequencing was performed at Texas A&M (TxGen, 1500 Research Pkwy Suite 250, College Station, TX, USA) using Illumina NovaSeq 6000 (2.6× average coverage). Before library preparation, the samples were repurified. The sequencing raw data was processed using the GATK (Genome Analysis Tool Kit, Broad Institute, Cambridge, MA, USA) best practices for variant calling implemented in the DRAGEN platform (Illumina, San Diego, CA, USA). The sequences were then aligned against Sorghum bicolor v3.1.1 available at Phytozome (https://phytozome-next.jgi.doe.gov/info/Sbicolor_v3_1_1, accessed on 1 May 2023) as the reference genome for SNP calling [20]. The calls were filtered using the following parameters: Minimum coverage depth = 3 and minimum genotype quality = 9 on a Phred scale. The resulting variants were then filtered to only accept SNPs with a minimum minor allele frequency of 0.05 and a maximum rate of missing data of 0.5. Variant filtration was conducted by using bcftools. The SNP data were finally imputed using Beagle (Beagle 8.7e1.jar) [21]. PLINK v1.9 [22] was used for VCF file conversion, and for computational efficiency we randomly subsampled 500,000 SNPs from the full genotypic dataset for downstream GWAS and ML analyses.
For GWAS, GEMMA v0.98.3 [23] was used by conducting a univariate linear mixed model association testing, accounting for the relatedness matrix as a covariate term and employing the Wald test for determining statistical significance. The software package vcf2gwas v0.8.3 generated phenotype and genotype distribution [24]. Top candidate SNPs from the GWAS were tracked by searching the reference sorghum genome sequence version 3.1.1, accessed through the JGI Phytozome 13 website. Available predicted protein structures of top candidate genes identified by GWAS were retrieved from the AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk/ accessed on 8 May 2023) [25] using the closest available Arabidopsis orthologs, and were used for qualitative structural context (Figure 1). Available predicted protein network nodes of top candidate genes were searched through the STRING database version 11.5 website (https://string-db.org/ accessed on 5 May 2023) [26].

2.2. Data Merging, Preprocessing, and Standardization for ML

The analysis integrated genotype and phenotype data to create a unified dataset. SNP marker positions, extracted from the VCF file, were transposed so that individual samples (cultivars) were represented as rows and SNP markers as columns. This transposed genotype matrix was merged with phenotype data based on shared sample identifiers. Phenotype data included two traits: LB-Incidence (incidence of leaf blight) and LB-Severity (severity of leaf blight). Each trait was analyzed independently by defining it as the target variable while using SNP markers as features. Genotype data were encoded as allele dosages (0, 1, 2), representing the number of alternative alleles present at each SNP locus (homozygous reference = 0, heterozygous = 1, homozygous alternative = 2). This encoding aligns with the additive genetic framework in quantitative genetics, where allele dosage is assumed to contribute proportionally to phenotypic variation. While these values are numerical, they represent discrete genotype states rather than continuous measurements, allowing models such as CatBoost to effectively capture non-linear relationships without requiring extensive feature transformation. Prior to model training, the target variables were standardized to have zero mean and unit variance using StandardScaler from the sklearn library [27]. To avoid data leakage, scaling parameters were learned exclusively from the training folds and applied to the corresponding validation folds within each cross-validation iteration.

2.3. ML Model Training, Evaluation, and Feature Importance Analysis

CatBoost [13] was selected due to its strong performance on high-dimensional datasets, its ability to model complex non-linear relationships, and its robustness in handling structured data without extensive preprocessing. These characteristics make it particularly suitable for genomic datasets, where the number of features (SNP markers) greatly exceeds the number of samples. Separate models were trained for LB-Incidence and LB-Severity using SNP markers as predictive features. The CatBoostRegressor was implemented using default hyperparameters, with verbose = 50 and thread_count = −1 specified to enable training monitoring and efficient CPU utilization. Accordingly, parameters such as learning rate, tree depth, number of iterations, and regularization strength were automatically determined by the algorithm and were not manually tuned. Default hyperparameters were intentionally used to ensure reproducibility and consistency across both traits and to reduce the risk of overfitting associated with extensive hyperparameter optimization in high-dimensional datasets with relatively small sample sizes. Given that the primary objective of this analysis was feature ranking and candidate SNP prioritization rather than predictive optimization, this approach provides a stable and interpretable baseline.
Model performance was evaluated using 10-fold cross-validation. Root Mean Squared Error (RMSE) was calculated for each fold based on predictions for held-out data, and the mean RMSE across folds was reported as the overall performance metric. Feature (genomic position) importance scores were computed using the trained CatBoost models to identify SNP markers most relevant to each trait. Importance scores were normalized to facilitate comparison across features. Cumulative importance was then calculated to determine the minimum number of SNP markers required to explain 80% of the total importance. These features were extracted and visualized using cumulative importance plots. For each trait, the top 20 SNP markers were identified and visualized using bar plots to highlight the most predictive genomic positions. While no overlap was observed among the top 20 SNPs between LB-Incidence and LB-Severity, extending the analysis to SNPs within the 80% cumulative importance threshold revealed shared markers between traits. This broader ranking approach reflects the quantitative and polygenic nature of leaf blight resistance and enables flexible prioritization of candidate SNPs for downstream validation.

3. Results

3.1. Leaf Blight Incidence and Severity in Nigerien Sorghum Germplasm

Leaf blight incidence and severity were evaluated across two locations in Niger (Bengou and Maradi) (Table 1 and Table 2). The mean incidence was 81.26 ± 1.37. Among the accessions evaluated, only S39, N23, and N38 had mean leaf blight incidence below 50%. Overall, the mean severity level in Niger was 24.5 ± 0.64. Accessions S3, S43, N23, and N38 exhibited the lowest mean severity when tested against the leaf blight-causing pathogen E. turcicum.
Mean leaf blight incidence and severity were compared between Bengou and Maradi (Figure 2), using plot-level observations grouped by location. Both traits were significantly lower in Bengou (incidence: 72.94 ± 1.91; severity: 16.04 ± 0.76) than in Maradi (incidence: 89.17 ± 1.86; severity: 32.54 ± 0.76; two-sample t-test, p < 0.0001), indicating a significant location effect.
Pearson’s correlation showed a positive association between accession-level mean incidence and mean severity (r = 0.61, p < 0.0001; Figure 3), indicating that accessions with higher incidence tended to exhibit higher severity.

3.2. GWAS of Leaf Blight Incidence

GWAS analysis identified two SNPs passing the Bonferroni threshold (S07_42352720 and S05_47989804) (Figure 4 and Table 3). Figure 4 displays the Manhattan plot, and Table 3 lists the nearby annotated genes for the top SNPs identified in this study. Figure 1 illustrates the protein structures of the top candidate genes generated based on Arabidopsis orthologs through the AlphaFold Protein Structure Database. Figure 5 shows predicted protein–protein interaction networks associated with selected candidate genes for leaf blight incidence in Niger. As not all candidate genes were available in AlphaFold and STRING databases, Figure 1 and Figure 5 show the data for a few candidate genes.

3.3. Identification of Predictive SNP Markers for Leaf Blight Resistance Using ML

To further examine genomic predictors associated with leaf blight incidence and severity, we employed the CatBoost ML algorithm to rank SNP markers for both traits. Given the high-dimensional nature of the SNP dataset relative to the number of accessions, this analysis was designed to prioritize informative markers based on their relative contribution to phenotypic variation rather than to optimize predictive performance. Model performance was evaluated using 10-fold cross-validation, with Root Mean Squared Error (RMSE) calculated on held-out folds and averaged across iterations. The CatBoost models yielded mean RMSE values of 0.74 for LB-Incidence (Supplementary Figure S1a) and 0.94 for LB-Severity (Supplementary Figure S1b), indicating moderate predictive performance consistent with the quantitative and polygenic nature of these traits. An 80% cumulative importance threshold was used to identify the most influential SNP markers contributing to LB-Incidence (Supplementary Figure S1c; Supplementary Table S1) and LB-Severity (Supplementary Figure S1d; Supplementary Table S2). This threshold summarizes the subset of markers accounting for the majority of the feature-importance mass in the CatBoost models and provides a biologically meaningful set of candidate loci for downstream investigation. The CatBoost model prioritized a set of SNP markers for leaf blight incidence (Figure 6). The top 20 markers, ranked by their scaled importance scores, are distributed across multiple chromosomes, supporting a complex and potentially polygenic genetic architecture underlying this trait. Notably, marker S08_10325034 exhibited the highest importance score, accounting for approximately 100% of the scaled importance, followed by S10_6954311 and S02_27002932, with importance scores of approximately 49.2% and 42.1%, respectively. The remaining markers in the top 20 displayed importance scores ranging from 35.6% to 12.9% (Figure 6).
To identify potential candidate genes associated with these predictive markers, we examined the genomic regions surrounding the top five SNPs for LB-Incidence (Table 4). These regions harbor genes with diverse predicted functions, including a serine carboxypeptidase, a flavonol-3-O-glycoside-7-O-glucosyltransferase, an LETM1-like protein, an oligopeptide transporter, and a phosphatidate cytidylyltransferase. Similarly, the CatBoost analysis for leaf blight severity revealed a distinct set of prioritized SNP markers (Figure 7). The top 20 markers for severity also spanned multiple chromosomes and exhibited a range of importance scores, further supporting a complex genetic basis. Marker S02_47024283 had the highest importance score of approximately 100%, followed by S05_38801458 and S09_24964148, with importance scores of approximately 98.4% and 97.5%, respectively. The importance scores for the remaining top 20 markers ranged from 94.6% to 57.8% (Figure 7). The top five SNPs associated with LB-Severity were also linked to nearby candidate genes (Table 4). These include genes encoding an embryo-defective protein, an uncharacterized protein, a cis-zeatin O-beta-D-glucosyltransferase, and a leucine-rich repeat (LRR) protein. A comparison of the top 20 marker sets for incidence and severity revealed no overlap between the two traits, suggesting that these phenotypes may capture partly distinct genomic signals in this dataset. However, two SNPs—S02_50596409 (associated with nearby genes including members of the DVL family and glutathione S-transferase) and S02_54879183 (near a gene encoding a glucosyltransferase)—exhibited moderately high importance scores in both the incidence and severity models, despite not being among the top 20 markers for either trait. This observation suggests that while the primary genetic factors influencing incidence and severity may be largely distinct, a subset of genomic regions may contribute to both traits at moderate effect sizes. Overall, these results highlight the utility of the CatBoost framework for prioritizing candidate SNPs in complex, quantitative disease resistance traits, while emphasizing that the identified markers represent ranked candidates requiring further validation rather than definitive causal loci.

4. Discussion

The impact of climate change, coupled with population growth to around 9.1 billion by 2050, will require increases in crop production, including cereals such as sorghum for food, feed, and other uses [28]. Increasing sorghum production in regions affected by fungal diseases, including E. turcicum, will require integrated disease management strategies, including the identification of resistant germplasm [1,8,29]. Sorghum plays an integral role in the lives of millions of inhabitants in Niger and is used primarily for food, feed, and commerce [30,31,32]. The crop ranks behind pearl millet in importance [30,31]. However, sorghum yields in Niger are still low due to several factors, including diseases such as leaf blight [7]. During the 2019 and 2022 surveys of sorghum diseases across major production regions in Niger, the prevalence of leaf blight was 89% and 100%, respectively [7,8]. This disease is widespread in all sorghum growing regions of Niger, suggesting that evaluating accessions for resistance under natural infection or inoculation with the pathogen can both be effective. However, differences in leaf blight incidence and severity between Bengou, Dosso region and Maradi, Maradi region were noted. These differences could be attributed to the differences in weather patterns as recorded in Table 5 [8] or the pathotypes that exist in the two locations. This study’s mean leaf blight incidence rate was 81.26 ± 1.37, while the mean severity level in Niger was 24.5 ± 0.64. These values can be viewed alongside the previously reported Senegal evaluation of the same accessions [17], although the present study focuses on field performance observed in Niger. The mean incidence of leaf blight was 87% across five regions in Niger surveyed for sorghum diseases in 2022, while the mean severity was 22% [7]. Some regions have reported leaf blight-resistant sorghums, including Puerto Rico, Sudan, and Ethiopia [5,6,11]. Among the accessions evaluated in Senegal and Niger, N30 from Niger exhibited low leaf blight severity in Senegal [17]. GWAS is a promising approach to genetic analysis and has proven to be a valuable tool in identifying candidate genes for many plant traits [33]. The same accessions used in this study were planted in three locations in Senegal, and the GWAS identified six SNPs associated with the average leaf blight incidence rate [17]. In the Senegal GWAS, the candidate genes were found in chromosomes 2, 3, 5, 8, and 9 [17], while the SNPs in the current Niger study were found in chromosomes 5, 7, and 10. Using three tropical maize germplasm mapping panels evaluated against Setosphaeria turcica causal agent of Northern corn leaf blight (NCLB), a GWAS identified 22 SNPs significantly associated with NCLB response [34], while Ding et al. [35] noted 12 and 10 loci that were significantly associated with NCBL resistance. Candidate genes found in both sorghum and maize in response to leaf blight are reported to play a role in resistance to both crops [36]. Lipps et al. [37] evaluated two sorghum recombinant inbred line populations for response against E. turcicum and detected six QTLs. All these GWAS on sorghum and maize against E. turcicum will continue to enhance our knowledge of the genes involved in resistance mechanisms in these two economically important crops. Because incidence and severity differed significantly between Bengou and Maradi, environmental heterogeneity likely contributed to phenotypic variation in this study. Accordingly, the present GWAS based on accession-level means across locations should not be interpreted as a formal genotype-by-environment analysis.
Herein, GWAS identified candidate SNPs associated with leaf blight incidence in Niger. The closest annotated gene from S07_42352720 is lysine ketoglutarate reductase trans-splicing related 1 (DUF707). In a transcriptome study of rice (Oryza sativa), DUF707 was highly expressed when exposed to herbicide [38]. Moreover, as displayed in Figure 5a, the predicted protein network nodes of the gene include protein kinase crinkly4 and glycosyltransferase. Glycosyltransferases have been reported to be top defense-related genes against fungal pathogens in sorghum and have an essential role in plant defense and stress tolerance [39,40]. The second listed candidate gene is Sobic.005G115200, which contains the ring finger domain. RING finger proteins are essential in governing growth and development, hormone signaling, and controlling responses to biotic and abiotic stresses in plants [41]. The third listed gene is zinc finger containing Sobic.007G116000. These plants also showed constitutive up-regulation of multiple defense-related genes. In Nicotiana benthamiana, zinc-finger protein plays a crucial role in activating the pathogen defense response in plants [42]. Phosphofructokinase was the closest gene to the SNP locus S05_53064984. Phosphofructokinase is critical in sugar metabolism and is closely linked to drought stress responses in cotton (Gossypium arboreum L.) [43]. Predicted protein network nodes of the gene were also highly linked with ring-h2 finger protein, lateral organ boundaries (LOB) domains, and hexokinase 5. LBD proteins have been well documented, with biological roles in plant development and defense response processes [44]. Likewise, mitochondria-associated hexokinases control programmed cell death in N. benthamiana [45]. Plant steroids (Sobic.010G125000) are perceived by cell surface receptors that contain transmembrane receptor serine/threonine kinases that play a central role in signaling during pathogen recognition, the subsequent activation of plant defense mechanisms and developmental control [46,47,48].
While GWAS has been a valuable tool for identifying candidate genes associated with important plant traits [33], this study employed both GWAS and a machine learning (CatBoost) approach to identify SNP markers associated with leaf blight resistance in Nigerien and Senegalese sorghum germplasms. Complementing the GWAS results, the CatBoost analysis identified a distinct set of SNP markers with high predictive importance for both leaf blight incidence and severity (Figure 6 and Figure 7). Notably, there was no overlap between the significant SNPs identified by traditional GWAS and the top-ranked markers identified by the CatBoost algorithm for either incidence or severity. This lack of concordance between the two analytical approaches suggests that ML methods such as CatBoost may prioritize genomic regions not highlighted by conventional GWAS in this dataset, potentially due to their ability to capture non-linear effects and interactions among loci. This pattern further indicates that incidence and severity may require separate consideration in future marker prioritization and validation efforts. It is important to note that the CatBoost framework in this study was applied primarily as a feature-ranking tool rather than a fully optimized predictive model. Given the high dimensionality of the SNP dataset relative to the number of accessions, and the quantitative nature of the traits, the analysis focused on identifying and prioritizing candidate SNPs based on their relative contribution to phenotypic variation. Although the models demonstrated moderate predictive performance (as reflected by cross-validated RMSE values), these results are consistent with expectations for complex polygenic traits influenced by many loci with small to moderate effects. Therefore, the identified SNPs should be interpreted as prioritized candidates rather than definitive causal variants.
For leaf blight incidence, the CatBoost model highlighted a region on chromosome 8 containing a gene encoding a serine carboxypeptidase 1 precursor (Sobic.008G073700) near the most predictive SNP marker (S08_10325034, Table 4). Serine carboxypeptidases and their close relatives, serine carboxypeptidase-like proteins, are known to play multifaceted roles in plant growth, development, and stress responses [49]. A comprehensive study of the SCPL gene family in soybean identified 73 SCPL genes and demonstrated their involvement in resistance to both biotic and abiotic stresses, including nematode infection, drought, salinity, and cold [49]. Another notable candidate gene identified for incidence is Sobic.010G081600, encoding a flavonol-3-O-glycoside-7-O-glucosyltransferase (Table 4). Flavonoids are a diverse group of secondary metabolites known to play significant roles in plant defense against both biotic and abiotic stresses [50]. The role of flavonol-3-O-glycoside-7-O-glucosyltransferase in modifying flavonoids suggests that it may influence the accumulation or activity of specific flavonoid compounds contributing to leaf blight resistance in sorghum [50].
For leaf blight severity, a gene encoding a leucine-rich repeat (LRR) protein (Sobic.002G104500) was identified near a highly ranked SNP marker for severity (S02_12376308, Table 4). LRR proteins, particularly those belonging to the LRR receptor-like kinase (LRR-RLK) family, are known to function as pattern recognition receptors (PRRs) in plants, playing crucial roles in the perception of pathogen-associated molecular patterns (PAMPs) and the activation of downstream defense responses [51]. The identification of an LRR protein gene near a highly ranked SNP marker suggests that this protein may be involved in the recognition of Exserohilum turcicum-derived PAMPs and the initiation of defense signaling pathways in sorghum [51].
In this study, top candidate genes identified through GWAS and CatBoost were associated with functions related to plant defense mechanisms and stress responses, making these accessions valuable candidates for follow-up evaluation in sorghum leaf blight research. The use of CatBoost, in particular, highlighted additional markers and nearby candidate genes not prioritized by GWAS alone, supporting the value of complementary analytical approaches for leaf blight-associated traits. Additionally, it is important to consider other genes identified through the STRING database that are connected to the top candidates. Functional validation remains essential to confirm the biological relevance of these associations. However, gene editing in sorghum using CRISPR/Cas9 remains technically challenging due to its monocot nature. Expanding the number of sorghum accessions from Niger and Senegal and identifying overlapping signals across independent studies will be critical for refining candidate gene selection, while advances in gene-editing technologies may facilitate future functional validation.
GWAS has been instrumental in identifying genetic variations linked to specific phenotypes, such as candidate genes for blackleg resistance in Brassica juncea [52]. However, GWAS is limited in detecting rare genetic variants and often struggles to capture non-linear effects of genomic variation on traits [53]. While GWAS has identified numerous trait-associated loci, the causal genes within these loci often remain unclear, making functional validation challenging [54]. In contrast, machine learning offers a complementary framework by leveraging high-dimensional genomic data to uncover complex genotype–phenotype relationships [55]. ML approaches can capture subtle genomic signals, including non-linear interactions and small-effect variants, thereby enhancing our understanding of complex traits such as disease resistance [56]. By integrating genomic data, ML models can prioritize genes and alleles contributing to defense mechanisms, offering an alternative perspective to traditional association methods. While ML has been widely applied to agronomic traits such as yield and flowering time, its application to disease resistance remains relatively limited. Emerging studies in crops such as rice, wheat, maize, and sugarcane highlight its potential, although no single method is universally optimal, emphasizing the need for context-specific and integrative analytical strategies [57].
For leaf blight resistance in sorghum, the differences in SNP positions identified by GWAS and CatBoost highlight these methodological contrasts. GWAS relies on single-marker linear regression and may overlook loci with polygenic effects or small contributions. In contrast, CatBoost employs ensemble learning and non-linear modeling, enabling the detection and prioritization of SNPs associated with both LB-Incidence and LB-Severity. The inability of GWAS to identify Bonferroni-significant markers for LB-Severity, together with the CatBoost prioritization of markers for both traits, suggests that the two approaches capture different aspects of the genetic architecture. These findings support the interpretation that GWAS and ML methods are complementary rather than competing approaches for dissecting complex disease resistance traits.

5. Conclusions

This study evaluated 102 sorghum accessions from Niger and Senegal (including checks) for leaf blight incidence and severity across two field locations in Niger, revealing substantial phenotypic variation for both traits. Several accessions, including S39, N23, and N38, exhibited strong resistance, with mean leaf blight incidence below 50%, while S3, S43, N23, and N38 showed the lowest severity levels. These accessions represent valuable genetic resources for sorghum improvement programs targeting disease resistance. By integrating traditional GWAS with a machine learning (CatBoost) framework, this study identified complementary sets of candidate SNP markers and nearby genes associated with leaf blight incidence and severity. Notably, the lack of overlap between the top-ranked markers for incidence and severity, as well as between GWAS and CatBoost results, suggests that these traits may be governed by partly distinct and complex genetic architectures. The CatBoost analysis further prioritized additional genomic regions not captured by GWAS alone, highlighting the value of ML approaches in uncovering non-linear and polygenic signals in high-dimensional genomic data. Importantly, the ML framework was applied as a feature-ranking and candidate prioritization tool rather than a fully optimized predictive model. As such, the identified SNPs should be interpreted as prioritized candidates for further investigation rather than confirmed causal variants. The integration of GWAS and ML thus provides a complementary and scalable strategy for identifying genomic regions associated with complex disease resistance traits. Overall, these findings contribute to a better understanding of the genetic basis of leaf blight resistance in sorghum and provide a set of candidate markers for future research. However, validation in larger and independent populations, as well as functional characterization of candidate genes, will be essential before these markers can be effectively deployed in breeding programs.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/pathogens15040389/s1, Figure S1: Cross-validation and cumulative feature importance for LB-Incidence and LB-Severity; Table S1: Copy of S1_top_80_persent_comlmp_positions_catboost_inc; Table S2: Copy of S2_top_80_persen_cumImp_positions_catboost_sev.

Author Contributions

Conceptualization, L.K.P. and C.W.M.; methodology, L.K.P., L.C.P. and C.W.M.; formal analysis, L.K.P., A.R.T., J.R.B., E.J.S.A., S.P. and L.C.P. investigation, L.K.P.; resources, C.W.M.; Software, A.R.T. and J.R.B.; data curation, L.K.P. and E.J.S.A.; Funding acquisition, C.W.M.; Supervision, C.W.M.; Project administration, C.W.M.; writing—original draft, L.K.P. and E.J.S.A.; writing—review and editing, all authors. All authors have read and agreed to the published version of the manuscript.

Funding

This research (CRIS # 3091-22000-040-000-D) was supported in part by the U.S. Department of Agricultural Research Service. The USDA is an equal opportunity provider and employer. This research was also funded by AFRI, NIFA, USDA grant number 20156800423492. This research was made possible with the support of the American people provided to the Feed the Future Innovation Lab for Collaborative Research on Sorghum and Millet through the United States Agency for International Development (USAID). The contents are the sole responsibility of the authors and do not necessarily reflect the views of the USAID or the United States Government. Program activities are funded by the United States Agency for International Development (USAID) under Cooperative Agreement No. AID-OAA-A-13-00047.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon reasonable request from the authors L.K.P. and C.W.M. The underlying SNP datasets for this sorghum population are publicly available at the Texas Data Repository (TDR) under a CC0 agreement: https://doi.org/10.18738/T8/RGPPGA (accessed on 30 March 2026). The raw sequence data for the C2 accessions are also available in the Sequence Read Archive (SRA) under accession number PRJNA1161677. Supplementary Tables S1 and S2 are provided with the manuscript, and additional processed analysis outputs used in the present study are available from the corresponding authors upon reasonable request.

Acknowledgments

The authors would like to express their sincere gratitude to Haougui Adamou, Bibata O. Ali, and Issa Karimou for their input and technical assistance.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bergquist, R. Leaf Blight. In Compendium of Sorghum Diseases, 2nd ed.; Frederiksen, R.A., Odvody, G.N., Eds.; The American Phytopathological Society: St. Paul, MN, USA, 2000; pp. 9–10. [Google Scholar]
  2. El-Naggar, A.A.A. Occurrence of Exserohilum turcicum f. sp. sorghi, the causal organism of sorghum leaf blight in Upper Egypt. J. Plant Prot. Path. 2012, 3, 337–346. [Google Scholar] [CrossRef]
  3. Adhikari, P.; Mideros, S.X.; Jamann, T.M. Differential regulation of maize and sorghum orthologs in response to the fungal pathogen Exserohilum turcicum. Front. Plant Sci. 2021, 12, 675208. [Google Scholar] [CrossRef]
  4. Durga, K.K. Leaf blight Exserohilum turcicum (pass.) of sorghum—A Review. Agric. Rev. 2002, 23, 175–184. [Google Scholar]
  5. Beshir, M.M.; Ahmed, N.E.; Mukhtar, A.; Babiker, I.H.; Rubaihayo, P.; Okori, P. Prevalence and severity of sorghum leaf blight in the sorghum growing areas of Central Sudan. Wudpecker J. Agric. Res. 2015, 4, 54–60. [Google Scholar]
  6. Tesema, M.L.; Mengesha, G.G.; Dojamo, T.S.; Takiso, S.M. Response of sorghum genotypes for turcicum leaf blight [Exserohilum turcicum (Pass.) Leonard & Suggs] and agronomic performances in Southern Ethiopia. Int. J. Sci. Res. Arch. 2022, 5, 77–95. [Google Scholar] [CrossRef]
  7. Prom, L.K.; Adamou, H.; Bibata, A.O.; Issa, K.; Abdoulkadri, A.A.; Oumarou, O.H.; Adamou, B.; Fall, C.; Magill, C. Incidence, severity, and prevalence of sorghum diseases in the major production regions in Niger. J. Plant Stud. 2023, 12, 48–59. [Google Scholar] [CrossRef]
  8. Prom, L.K.; Haougui, A.; Adamou, I.; Abdoulkadri, A.A.; Karimou, I.; Ali, O.B.; Magill, C. Survey of the prevalence and incidence of foliar and panicle diseases of sorghum across production fields in Niger. Plant Pathol. J. 2020, 19, 106–113. [Google Scholar] [CrossRef]
  9. Prom, L.K.; Sarr, M.P.; Diatta, C.; Ngom, A.; Aïdara, O.; Cissé, N.; Magill, C. The occurrence and distribution of sorghum diseases in major production regions of Senegal, West Africa. Plant Pathol. J. 2021, 20, 1–10. [Google Scholar] [CrossRef]
  10. Ogolla, F.O.; Muraya, M.M.; Onyango, B.O. Incidence and severity of turcicum leaf blight caused by Exserohilum turcicum (pass.) Leonard and Suggs) on sorghum populations in different regions of Tharaka Nithi County, Kenya. J. Sci. Eng. Res. 2018, 6, 104–111. [Google Scholar]
  11. Hepperly, P.R.; Sotomayor-Ríos, A. New sorghum leaf blight resistance sources: Identification, description and reactions of F1 hybrids. J. Agric. Univ. P. R. 1987, 71, 293–299. [Google Scholar]
  12. Beshir, M.; Okori, P.; Ahmed, N.E.; Rubaihayo, P.; Ali, A.M.; Karim, S. Resistance to anthracnose and turcicum leaf blight in sorghumunder dual infection. Plant Breed. 2016, 135, 318–322. [Google Scholar] [CrossRef]
  13. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. Catboost: Unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 2–8 December 2018; Volume 31, pp. 6638–6648. [Google Scholar]
  14. Republic of Niger. Latitude. Available online: https://latitude.to/map/ne/niger (accessed on 3 February 2025).
  15. Latitude Cities Dosso and Maradi. Latitude and Longitude. Available online: https://latitude.to/map/ne/niger/cities (accessed on 3 February 2025).
  16. Moussa, S. Situation of soils in Niger: Constraints and needs. In Proceedings of the Global Soil Partnership (GSP) in West Africa, Accra, Ghana, 4–5 February 2013; p. 24. Available online: https://openknowledge.fao.org/handle/20.500.14283/az989e (accessed on 4 August 2020).
  17. Prom, L.K.; Botkin, J.R.; Ahn, E.J.S.; Sarr, M.P.; Diatta, C.; Fall, C.; Magill, C.W. A Genome-Wide Association Study of Nigerien and Senegalese Sorghum Germplasm of Exserohilum turcicum, the Causal Agent of Leaf Blight. Plants 2023, 12, 4010. [Google Scholar] [CrossRef]
  18. Kale, S.S.; Kadu, T.P.; Chavan, N.R.; Chavan, N.S. Rapid and efficient method of genomic dna extraction from sweet sorghum [Sorghum bicolor (L.)] using leaf tissue. Int. J. Chem. Stud. 2020, 8, 1166–1169. [Google Scholar] [CrossRef]
  19. Doyle, J.J.; Doyle, J. A rapid DNA isolation procedure for small quantities of Fresh leaf tissue. Phytochem. Bull. 1987, 19, 11–15. [Google Scholar]
  20. Mccormick, R.F.; Truong, S.K.; Sreedasyam, A.; Jenkins, J.; Shu, S.; Sims, D.; Mullet, J.E. The Sorghum bicolor reference genome: Improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. Plant J. 2018, 93, 338–354. [Google Scholar] [CrossRef]
  21. Browning, B.L.; Browning, S.R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 2016, 98, 116–126. [Google Scholar] [CrossRef]
  22. Chang, C.C.; Chow, C.C.; Tellier, L.C.; Vattikuti, S.; Purcell, S.M.; Lee, J.J. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 2015, 4, 7. [Google Scholar] [CrossRef]
  23. Zhou, X.; Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 2012, 44, 821–824. [Google Scholar] [CrossRef] [PubMed]
  24. Vogt, F.; Shirsekar, G.; Weigel, D. vcf2gwas: Python API for comprehensive GWAS analysis using GEMMA. Bioinformatics 2022, 38, 839–840. [Google Scholar] [CrossRef] [PubMed]
  25. Varadi, M.; Anyango, S.; Deshpande, M.; Nair, S.; Natassia, C.; Yordanova, G.; Yuan, D.; Stroe, O.; Wood, G.; Laydon, A.; et al. AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models. Nucleic Acids Res. 2022, 50, D439–D444. [Google Scholar] [CrossRef] [PubMed]
  26. Szklarczyk, D.; Gable, A.L.; Nastou, K.C.; Lyon, D.; Kirsch, R.; Pyysalo, S.; Doncheva, N.T.; Legeay, M.; Fang, T.; Bork, P.; et al. The STRING Database in 2021: Customizable Protein–Protein Networks, and Functional Characterization of User-Uploaded Gene/Measurement Sets. Nucleic Acids Res. 2021, 49, D605–D612. [Google Scholar] [CrossRef]
  27. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  28. FAO. Global Agriculture Towards 2050. Available online: https://www.fao.org/fileadmin/user_upload/lon/HLEF2050_Global_Agriculture.pdf (accessed on 17 December 2023).
  29. Frederiksen, R.; Odvody, G. Compendium of Sorghum Diseases, 2nd ed.; American Phytopathological Society (APS Press): St. Paul, MN, USA, 2000. [Google Scholar]
  30. Hamidou, M.; Souley, A.K.M.; Kapran, I.; Souleymane, O.; Danquah, E.D.; Ofori, K.; Gracen, V.; Ba, M.N. Genetic Variability and Its Implications on Early Generation Sorghum Lines Selection for Yield, Yield Contributing Traits, and Resistance to Sorghum Midge. Int. J. Agron. 2018, 2018, 1864797. [Google Scholar] [CrossRef]
  31. Manssour, A.M.; Zoubeirou, A.M.; Nomao, D.L.; Dijibo, E.S.; Ambouta, J.M.K. Productivity of the cultivation of sorghum (Sorghum bicolor) in Acacia senegal (L.) Willd. based agroforestry system in Niger. J. Appl. Biosci. 2014, 82, 7339–7346. [Google Scholar] [CrossRef]
  32. Ousmane, S.D.; Dzidzienyo, D.K.; Soule, A.K.M.; Haoua, H.B.; Teme, N.; Eric, D.; Pangirayi, T.; Tuinstra, M. Sorghum Bmr6 and Bmr12 lines may provide new forage opportunities in West Africa. Int. Res. J. Plant Sci. 2023, 14, 26. [Google Scholar]
  33. Alqudah, A.M.; Sallam, A.; Baenziger, P.S.; Börner, A. GWAS: Fast-Forwarding gene identification and characterization in temperate Cereals: Lessons from Barley—A review. J. Adv. Res. 2020, 22, 119–135. [Google Scholar] [CrossRef] [PubMed]
  34. Rashid, Z.; Sofi, M.; Harlapur, S.I.; Kachapur, R.M.; Dar, Z.A.; Singh, P.K.; Zaidi, P.H.; Vivek, B.S.; Nair, S.K. Genome-wide association studies in tropical maize germplasm reveal novel and known genomic regions for resistance to Northern corn leaf blight. Sci. Rep. 2020, 10, 21949. [Google Scholar] [CrossRef] [PubMed]
  35. Ding, J.; Ali, F.; Chen, G.; Li, H.; Mahuku, G.; Yang, N.; Narro, L.; Magorokosho, C.; Makumbi, D.; Yan, J. Genome-wide association mapping reveals novel sources of resistance to northern corn leaf blight in maize. BMC Plant Biol. 2015, 15, 206. [Google Scholar] [CrossRef] [PubMed]
  36. Zhang, X.; Fernandes, S.B.; Kaiser, C.; Adhikari, P.; Brown, P.J.; Mideros, S.X.; Jamann, T.M. Conserved defense responses between maize and sorghum to Exserohilum turcicum. BMC Plant Biol. 2020, 20, 67. [Google Scholar] [CrossRef]
  37. Lipps, S.; Rooney, W.L.; Mideros, S.X.; Jamann, T.M. Identification of quantitative trait loci for sorghum leaf blight resistance. Crop Sci. 2022, 62, 1550–1558. [Google Scholar] [CrossRef]
  38. Lu, Y.C.; Zhang, J.J.; Luo, F.; Huang, M.T.; Yang, H. RNA-Sequencing Oryza Sativa Transcriptome in Response to Herbicide Isoprotruon and Characterization of Genes Involved in IPU Detoxification. RSC Adv. 2016, 6, 18852–18867. [Google Scholar] [CrossRef]
  39. Ahn, E.; Hu, Z.; Perumal, R.; Prom, L.K.; Odvody, G.; Upadhyaya, H.D.; Magill, C.W. Genome wide association analysis of sorghum mini core lines regarding anthracnose, downy mildew, and head smut. PLoS ONE 2019, 14, e0216671. [Google Scholar] [CrossRef]
  40. Vogt, T.; Jones, P. Glycosyltransferases in plant natural product synthesis: Characterization of a supergene family. Trends Plant Sci. 2000, 5, 380–386. [Google Scholar] [CrossRef]
  41. Yu, Y.H.; Xu, W.R.; Wang, S.Y.; Xu, Y.; Li, H.E.; Wang, Y.J.; Li, S.X. VPRFP1, a novel C4C4-type ring finger protein gene from chinese wild vitis pseudoreticulata, functions as a transcriptional activator in defence response of grapevine. J. Exp. Bot. 2011, 62, 5671–5682. [Google Scholar] [CrossRef] [PubMed]
  42. Oh, S.K.; Park, J.M.; Joung, Y.H.; Lee, S.; Chung, E.; Kim, S.Y.; Yu, S.H.; Choi, D. A plant EPF-type zinc-finger protein, CaPIF1, involved in defence against pathogens. Mol. Plant Pathol. 2005, 6, 269–285. [Google Scholar] [CrossRef]
  43. Mehari, T.G.; Xu, Y.; Umer, M.J.; Hui, F.; Cai, X.; Zhou, Z.; Hou, Y.; Wang, K.; Wang, B.; Liu, F. Genome-Wide Identification and Expression Analysis Elucidates the Potential Role of PFK Gene Family in Drought Stress Tolerance and Sugar Metabolism in Cotton. Front. Genet. 2022, 13, 922024. [Google Scholar] [CrossRef] [PubMed]
  44. Zhang, Y.; Li, Z.; Ma, B.; Hou, Q.; Wan, X. Phylogeny and Functions of LOB Domain Proteins in Plants. Int. J. Mol. Sci. 2020, 21, 2278. [Google Scholar] [CrossRef]
  45. Kim, M.; Lim, J.-H.; Ahn, C.S.; Park, K.; Kim, G.T.; Kim, W.T.; Pai, H.-S. Mitochondria-associated hexokinases play a role in the control of programmed cell death in Nicotiana benthamiana. Plant Cell 2006, 18, 2341–2355. [Google Scholar] [CrossRef] [PubMed]
  46. Afzal, A.J.; Wood, A.J.; Lightfoot, D.A. Plant receptor-like serine threonine kinases: Roles in signaling and plant defense. Mol. Plant Microbe Interact. 2008, 21, 507–517. [Google Scholar] [CrossRef]
  47. Li, J. Brassinosteroids signal through two receptor-like kinases. Curr. Opin. Plant Biol. 2003, 6, 494–499. [Google Scholar] [CrossRef]
  48. Zhang, X.; Wang, X.; Xu, K.; Jiang, Z.; Dong, K.; Xie, X.; Zhang, H.; Yue, N.; Zhang, Y.; Wang, X.-B.; et al. The serine/threonine/tyrosine kinase STY46 defends against hordeivirus infection by phosphorylating γb protein. Plant Physiol. 2021, 186, 715–730. [Google Scholar] [CrossRef]
  49. He, L.; Liu, Q.; Han, S. Genome-Wide Analysis of Serine Carboxypeptidase-like Genes in Soybean and Their Roles in Stress Resistance. Int. J. Mol. Sci. 2024, 25, 6712. [Google Scholar] [CrossRef] [PubMed]
  50. Treutter, D. Significance of Flavonoids in Plant Resistance: A Review. Environ. Chem. Lett. 2006, 4, 147–157. [Google Scholar] [CrossRef]
  51. Postel, S.; Kemmerling, B. Plant systems for recognition of pathogen-associated molecular patterns. Semin. Cell Dev. Biol. 2009, 20, 1025–1031. [Google Scholar] [CrossRef] [PubMed]
  52. Yang, H.; Mohd Saad, N.S.; Ibrahim, M.I.; Bayer, P.E.; Neik, T.X.; Severn-Ellis, A.A.; Pradhan, A.; Tirnaz, S.; Edwards, D.; Batley, J. Candidate Rlm6 resistance genes against Leptosphaeria maculans identified through a genome-wide association study in Brassica juncea (L.) Czern. Theor. Appl. Genet. 2021, 134, 2035–2050. [Google Scholar] [CrossRef]
  53. Zhao, Y.; Mette, M.F.; Gowda, M.; Longin, C.F.H.; Reif, J.C. Bridging the gap between marker-assisted and genomic selection of heading time and plant height in hybrid wheat. Heredity 2014, 112, 638–645. [Google Scholar] [CrossRef]
  54. Nicholls, H.L.; John, C.R.; Watson, D.S.; Munroe, P.B.; Barnes, M.R.; Cabrera, C.P. Reaching the end-game for GWAS: Machine learning approaches for the prioritization of complex disease loci. Front. Genet. 2020, 11, 350. [Google Scholar] [CrossRef]
  55. Hu, T.; Darabos, C.; Urbanowicz, R. Editorial: Machine learning in genome-wide association studies. Front. Genet. 2020, 11, 593958. [Google Scholar] [CrossRef]
  56. Sperschneider, J. Machine learning in plant–pathogen interactions: Empowering biological predictions from field scale to genome scale. New Phytol. 2020, 228, 35–41. [Google Scholar] [CrossRef]
  57. Upadhyaya, S.R.; Danilevicz, M.F.; Dolatabadian, A.; Neik, T.X.; Zhang, F.; Al-Mamun, H.A.; Bennamoun, M.; Batley, J.; Edwards, D. Genomics-based plant disease resistance prediction using machine learning. Plant Pathol. 2024, 73, 2298–2309. [Google Scholar] [CrossRef]
Figure 1. Predicted protein structures of top candidate genes. (a) Sobic.007G111800, (b) Sobic.005G115200, (c) Sobic.005G120800, and (d) Sobic.010G125000. Model confidence—blue: very high (pLDDT > 90), sky blue: confident (90 > pLDDT > 70), yellow: low (70 > pLDDT > 50), and orange: very low (pLDDT < 50). pLDDT: per-residue confidence score between 0 and 100. The predicted protein structure of Sobic.007G116000 is unavailable.
Figure 1. Predicted protein structures of top candidate genes. (a) Sobic.007G111800, (b) Sobic.005G115200, (c) Sobic.005G120800, and (d) Sobic.010G125000. Model confidence—blue: very high (pLDDT > 90), sky blue: confident (90 > pLDDT > 70), yellow: low (70 > pLDDT > 50), and orange: very low (pLDDT < 50). pLDDT: per-residue confidence score between 0 and 100. The predicted protein structure of Sobic.007G116000 is unavailable.
Pathogens 15 00389 g001
Figure 2. Leaf blight incidence (blue) and severity (red) differed between the two Niger locations (Bengou and Maradi) based on plot-level observations (two-sample t-test, p < 0.0001).
Figure 2. Leaf blight incidence (blue) and severity (red) differed between the two Niger locations (Bengou and Maradi) based on plot-level observations (two-sample t-test, p < 0.0001).
Pathogens 15 00389 g002
Figure 3. Pearson’s correlation between accession-level mean incidence and mean severity in Niger was r = 0.61 (p < 0.0001). Points are colored by location.
Figure 3. Pearson’s correlation between accession-level mean incidence and mean severity in Niger was r = 0.61 (p < 0.0001). Points are colored by location.
Pathogens 15 00389 g003
Figure 4. Genome-wide association results for leaf blight incidence in Niger. The Manhattan plot shows two SNPs exceeding the Bonferroni threshold on chromosomes 7 and 5.
Figure 4. Genome-wide association results for leaf blight incidence in Niger. The Manhattan plot shows two SNPs exceeding the Bonferroni threshold on chromosomes 7 and 5.
Pathogens 15 00389 g004
Figure 5. Predicted protein network nodes of top candidate genes. (a) Sobic.007G111800 and (b) Sobic.005G120800. Red beads indicate the candidate genes.
Figure 5. Predicted protein network nodes of top candidate genes. (a) Sobic.007G111800 and (b) Sobic.005G120800. Red beads indicate the candidate genes.
Pathogens 15 00389 g005
Figure 6. Top 20 SNP Markers Associated with Leaf Blight Incidence in Sorghum as Determined by CatBoost Algorithm. Importance scores of the top 20 SNP markers significantly associated with LB incidence in Nigerien and Senegalese sorghum accessions. Marker names indicate chromosome number and base pair position (e.g., S08_10325034 refers to chromosome 8, position 10,325,034).
Figure 6. Top 20 SNP Markers Associated with Leaf Blight Incidence in Sorghum as Determined by CatBoost Algorithm. Importance scores of the top 20 SNP markers significantly associated with LB incidence in Nigerien and Senegalese sorghum accessions. Marker names indicate chromosome number and base pair position (e.g., S08_10325034 refers to chromosome 8, position 10,325,034).
Pathogens 15 00389 g006
Figure 7. Top 20 SNP Markers Associated with Leaf Blight Severity in Sorghum as Determined by CatBoost Algorithm. The top 20 most important SNP markers for predicting LB severity in a panel of Nigerien and Senegalese sorghum accessions, as determined by the CatBoost algorithm. The scaled importance score (%) represents the relative contribution of each marker to the model’s predictive accuracy. Marker labels indicate chromosome and position (e.g., S05_38801458 represents chromosome 5, position 38,801,458).
Figure 7. Top 20 SNP Markers Associated with Leaf Blight Severity in Sorghum as Determined by CatBoost Algorithm. The top 20 most important SNP markers for predicting LB severity in a panel of Nigerien and Senegalese sorghum accessions, as determined by the CatBoost algorithm. The scaled importance score (%) represents the relative contribution of each marker to the model’s predictive accuracy. Marker labels indicate chromosome and position (e.g., S05_38801458 represents chromosome 5, position 38,801,458).
Pathogens 15 00389 g007
Table 1. The average leaf blight incidence rate across the two locations in Niger was ordered from high to low (the overall mean is shown in the final row for reference). The standard error of the mean is listed next to the average value.
Table 1. The average leaf blight incidence rate across the two locations in Niger was ordered from high to low (the overall mean is shown in the final row for reference). The standard error of the mean is listed next to the average value.
AccessionsIncidenceSEMAccessionsIncidenceSEM
N4100.000.00S2981.0010.21
N29100.000.00S4080.6013.24
N46100.000.00S980.1716.09
S2100.000.00S480.0020.00
S6100.000.00S1480.0016.13
S36100.000.00S3580.0020.00
S42100.000.00S4580.0016.33
S49100.000.00S4880.0016.33
S65100.000.00S5680.0013.66
N698.002.00S1579.5714.58
S3797.802.20S1078.8016.66
BTx62397.602.40S2278.5016.40
N4397.202.80N1978.1715.95
S5897.202.80N3678.1716.43
S3896.004.00N5077.8316.47
N2595.604.40N2777.5015.90
S3295.254.75S2777.5015.90
S1995.005.00S1377.0013.33
S3395.005.00S5776.719.65
S3495.003.52S5576.3315.91
S5995.005.00SC748-576.2010.56
S794.835.17N4076.179.78
S3194.505.50N5375.8316.20
S4494.006.00N2675.8016.87
S1792.867.14N3075.0025.00
N1892.177.83N4175.0017.08
N6091.834.13S175.0017.08
N390.675.91S575.0019.36
N2090.337.24S5075.0025.00
N990.006.63N2874.2018.85
N4290.0010.00S2373.8316.77
S890.006.07S6072.5715.31
S2890.006.83N571.4014.52
S4790.0010.00N271.0019.01
S3089.335.96S1270.1718.04
N4888.607.87S5469.0019.69
N4588.339.80N5468.8317.38
S5188.0012.00S4668.5023.21
S1187.507.92S1866.6721.08
S5287.258.73N5864.8320.58
S4186.338.27N2464.0020.40
N4486.008.07N5163.8320.36
N5585.839.81N5260.0024.49
N3984.6011.63N2256.8020.79
N5783.678.74S353.4022.62
N883.607.55S1650.0022.36
N4983.3316.67S4350.0022.36
N5683.3316.67S3949.1718.37
N5983.3310.54N2348.0018.52
S2183.3316.67N3825.0025.00
S2081.3311.91Average81.261.37
N3481.178.52
Table 2. The average leaf blight severity level across the two locations in Niger was ordered from high to low (the overall mean is shown in the final row for reference). The standard error of the mean is listed next to the average value.
Table 2. The average leaf blight severity level across the two locations in Niger was ordered from high to low (the overall mean is shown in the final row for reference). The standard error of the mean is listed next to the average value.
AccessionsSeveritySEMAccessionsSeveritySEM
S3739.505.10S1224.587.41
S4938.838.82N224.409.07
S3338.004.79S5024.138.38
S3837.507.35N623.834.77
N3437.176.01S3623.502.00
S3235.5010.80S4423.505.83
S3033.834.77S5223.006.29
N4233.503.74N1922.929.15
N4333.507.35N5322.926.61
S433.007.50N5622.927.55
S632.176.67S122.925.51
S4232.176.15N2822.409.23
S4631.6312.25N4022.172.11
BTx62331.502.45N6022.172.11
S231.5010.30N5122.008.01
S3431.504.00N2621.506.78
S5831.5010.30S5121.505.10
N1830.505.63SC748-521.505.10
N5029.586.95S1621.309.25
N429.502.45N4121.258.45
N4629.506.78S1421.256.17
S1029.508.12S2221.254.97
S4729.508.12N4520.505.00
S6029.006.30S1120.505.00
N328.833.33S6520.505.00
N4428.837.15S2719.587.62
S2128.8312.02S4819.586.68
S1527.577.90N919.505.10
N827.508.00S4019.505.10
N4827.504.90N5219.308.09
S827.508.60S2018.833.33
N2027.177.92S5618.834.22
N5527.175.43N5818.677.12
S727.176.54S3918.678.79
S4127.175.43N2218.408.06
S1726.935.95S518.408.06
S926.256.18S3518.405.91
S5526.257.63N2717.924.85
N2525.527.75N3617.924.10
N525.504.47S2317.924.10
N2925.504.47S4517.926.60
N3925.504.47S5917.504.90
N5725.502.58N2417.006.99
N5925.505.16S1817.005.96
S1925.503.65S5416.254.61
S2825.506.32N3014.136.66
S2925.505.77S313.305.73
S3125.504.08S4313.305.73
S5725.506.17N2312.004.85
S1324.716.63N388.888.88
N4924.585.90Average24.50.64
N5424.587.84
Table 3. Top five annotated genes nearest/nearby to the most significant SNPs associated with leaf blight incidence rate in Niger. Two top SNPs are statistically significant. Two prevalent bases are listed along with the p-value.
Table 3. Top five annotated genes nearest/nearby to the most significant SNPs associated with leaf blight incidence rate in Niger. Two top SNPs are statistically significant. Two prevalent bases are listed along with the p-value.
ChrLocationCandidate Gene and FunctionDistance (Base Pairs)Allelep-Value
742352720Sobic.007G111800
Lysine ketoglutarate reductase trans-splicing related 1 (DUF707)
212,604Reference: C
Alternate: T
0.000000041
547989804Sobic.005G115200
No annotation
Associated PlantFAMs via hmmsearch: Ring finger domain-containing protein
1,282,409Reference: T
Alternate: C
0.000000069
750231251Sobic.007G116000
Histone-lysine N-methyltransferase SU(VAR)3-9-related
Zinc finger
43,037Reference: A
Alternate: G
0.000000096
553064984Sobic.005G120800
Phosphofructokinase
78,542Reference: C
Alternate: T
0.00000011
1015300573Sobic.010G125000
Steroid nuclear receptor, ligand-binding, putative, expressed
5267Reference: T
Alternate: C
0.00000024
Table 4. Top Five Candidate Genes Associated with Leaf Blight Incidence and Severity as Determined by CatBoost. Candidate genes located near the top five SNP markers associated with leaf blight incidence and severity were identified using the CatBoost algorithm in Nigerien and Senegalese sorghum germplasms. The table lists the chromosome (Chr), SNP location, candidate gene name and function, distance between the SNP and gene (in base pairs), reference and alternate alleles, and the CatBoost importance score (%).
Table 4. Top Five Candidate Genes Associated with Leaf Blight Incidence and Severity as Determined by CatBoost. Candidate genes located near the top five SNP markers associated with leaf blight incidence and severity were identified using the CatBoost algorithm in Nigerien and Senegalese sorghum germplasms. The table lists the chromosome (Chr), SNP location, candidate gene name and function, distance between the SNP and gene (in base pairs), reference and alternate alleles, and the CatBoost importance score (%).
ChrLocationCandidate Gene and FunctionDistance (Base Pairs)AlleleImportance (%)
LB-Incidence
810325034Sobic.008G073700
Serine carboxypeptidase 1 precursor
84,236Reference: C
Alternate: G
100
106954311Sobic.010G081600
Flavonol-3-O-glycoside-7-O-glucosyltransferase 1
5919Reference: G
Alternate: A
49.2
227002932Sobic.002G144732
LETM1-like
23,252Reference: A
Alternate: G
42.1
153951729Sobic.001G276700
Oligopeptide transporter
7004Reference: C
Alternate: T
35.6
252200082Sobic.002G167100
Phosphatidate cytidylyltransferase
34,299Reference: T
Alternate: G
32.7
LB-Severity
247024283Sobic.002G155000
Embryo defective 1381
19,113Reference: A
Alternate: G
100
538801458Sobic.005G112566
Uncharacterized protein
143,175Reference: A
Alternate: C
98.4
924964148Sobic.009G094600
Uncharacterized protein
22,691Reference: C
Alternate: G
97.5
653021705Sobic.006G174600
Cis-zeatin O-beta-D-glucosyltransferase
94Reference: G
Alternate: A
94.6
212376308Sobic.002G104500
Leucine-rich repeat (LRR) protein
610Reference: T
Alternate: C
90.8
Table 5. Weather parameters for the two experimental sites.
Table 5. Weather parameters for the two experimental sites.
Region
DossoMaradi
Annual rainfallAverage of 700 mm in this region, but up 814 mm in Gaya from March to October (86% between June and September)550 mm from April to October (66% in July and August)
ClimateNorthern Dosso has Sahelian climate while the southern part (Gaya) belongs to the Sahelo-soudanian climateSahelian
Mean temperatures during the rainy seasonTemperatures (max: 33 °C; min: 24 °C)Temperatures (max: 28 °C; min: 23 °C)
Soil typeFerruginous tropical in the most part of this region, but hydromorphous at Bengou and less evoluted at Tara localityFerruginous tropical
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Prom, L.K.; Ahn, E.J.S.; Tukuli, A.R.; Botkin, J.R.; Park, S.; Perkin, L.C.; Magill, C.W. Genomic Analysis of Resistance to Exserohilum turcicum in Nigerien and Senegalese Sorghum Using GWAS and Machine Learning. Pathogens 2026, 15, 389. https://doi.org/10.3390/pathogens15040389

AMA Style

Prom LK, Ahn EJS, Tukuli AR, Botkin JR, Park S, Perkin LC, Magill CW. Genomic Analysis of Resistance to Exserohilum turcicum in Nigerien and Senegalese Sorghum Using GWAS and Machine Learning. Pathogens. 2026; 15(4):389. https://doi.org/10.3390/pathogens15040389

Chicago/Turabian Style

Prom, Louis K., Ezekiel J. S. Ahn, Adama R. Tukuli, Jacob R. Botkin, Sunchung Park, Lindsey C. Perkin, and Clint W. Magill. 2026. "Genomic Analysis of Resistance to Exserohilum turcicum in Nigerien and Senegalese Sorghum Using GWAS and Machine Learning" Pathogens 15, no. 4: 389. https://doi.org/10.3390/pathogens15040389

APA Style

Prom, L. K., Ahn, E. J. S., Tukuli, A. R., Botkin, J. R., Park, S., Perkin, L. C., & Magill, C. W. (2026). Genomic Analysis of Resistance to Exserohilum turcicum in Nigerien and Senegalese Sorghum Using GWAS and Machine Learning. Pathogens, 15(4), 389. https://doi.org/10.3390/pathogens15040389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop