Next Article in Journal
Elevated Soybean Seed Oil Phenotype Associated with a Single Nucleotide Polymorphism in GmNFR1α
Previous Article in Journal
Inheritance of Calyx Abscission in Apple: A Trait with Potential Impact on Fruit Rot Susceptibility
Previous Article in Special Issue
Ubiquitin-Conjugating Enzyme Positively Regulates Salicylic Acid and Jasmonic Acid Biosynthesis to Confer Broad-Spectrum Antiviral Resistance in Nicotiana benthamiana
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Lipid Metabolism and Actin Cytoskeleton Regulation Underlie Yield and Disease Resistance in Two Coffea canephora Breeding Populations

Sustainable Perennial Crops Laboratory, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA
*
Author to whom correspondence should be addressed.
Plants 2025, 14(23), 3675; https://doi.org/10.3390/plants14233675
Submission received: 7 November 2025 / Revised: 26 November 2025 / Accepted: 1 December 2025 / Published: 3 December 2025

Abstract

Distinct breeding populations of Coffea canephora often exhibit genetic divergence, yet the biological pathways underlying yield and leaf rust resistance in contrasting populations remain poorly understood. Here, we performed a comparative genomic analysis of two populations (Premature and Intermediate) to dissect the genetic architecture of coffee bean production, green bean yield, and leaf rust incidence. By integrating single-SNP association, machine learning (Bootstrap Forest), and Gene Ontology (GO) pathway analysis, we found that the Premature population’s traits were linked to specialized metabolic pathways, particularly lipid modification and organelle lumen–associated processes. In contrast, the Intermediate population was governed by core cellular machinery, with significant enrichment for actin cytoskeleton regulation and salicylic acid signaling. These findings demonstrate that distinct breeding populations achieve agronomic success through fundamentally different biological strategies and provide a reusable resource of ranked SNP lists for targeted, population-aware breeding.

1. Introduction

Coffee is one of the world’s most valuable agricultural commodities, supporting the livelihoods of millions globally. Production is dominated by Coffea arabica, known for cup quality, and Coffea canephora (robusta), which contributes approximately 40% of global production. C. canephora is increasingly vital due to its inherent disease resistance, higher yield potential, and adaptability to warmer climates [1,2,3]. However, like many perennial crops, coffee breeding faces significant challenges, primarily the long juvenile period and the need for extensive multi-year, multi-location field trials to evaluate complex traits like yield and disease resistance [4,5,6,7,8].
To accelerate genetic progress, genomic selection (GS) and genome-wide association studies (GWAS) have emerged as powerful tools to shorten breeding cycles [9,10,11,12]. While recent studies on C. canephora have successfully demonstrated the utility of genomic prediction [13] and polygenic association models [14], these efforts have largely focused on statistical accuracy rather than biological interpretation. Although selection history is known to drive genetic divergence between breeding populations [15,16,17], the specific biological pathways that drive agronomic performance in distinct populations remain largely unexplored.
Here, we address this gap by comparing the genetic architecture of two distinct C. canephora breeding populations: Premature (early ripening) and Intermediate (late ripening). To achieve this, we integrate single-SNP association, machine learning (Bootstrap Forest), and Gene Ontology (GO) enrichment to dissect the biological mechanisms underlying coffee bean production, leaf rust incidence, and green bean yield. By analyzing these populations side-by-side, we aim to determine whether they achieve agronomic success through shared or distinct biological strategies, providing targeted insights for tailored, population-specific breeding strategies.

2. Materials and Methods

2.1. Analytical Design and Rationale

To dissect the genetic architecture of complex agronomic traits, we employed a multi-layered analytical pipeline. While conventional Single-SNP association allows for the identification of specific genomic loci with major effects, it often lacks the power to capture complex, non-linear interactions inherent in polygenic traits. To overcome this limitation, we integrated a machine learning approach, Bootstrap Forest, which ranks markers based on their explanatory power (variable importance) while accounting for interactions among loci. Finally, to bridge the gap between statistical association and biological mechanism, we performed GO enrichment analysis on the top-ranked loci. This integrated approach allows us to move from identifying individual SNP markers to characterizing the distinct biological pathways driving agronomic performance in each population.

2.2. Experimental Populations and Data

The study analyzed two distinct C. canephora breeding populations, designated Premature and Intermediate, developed by the Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural (Incaper), ES, Brazil. These populations were selected because they represent the core heterotic groups used to extend the harvest season in the region; the Premature population ripens approximately one month earlier than the Intermediate population. The Intermediate population was derived from crosses of 16 progenitors, while the Premature population was selected from 9 progenitors [13].
The study analyzed 119 genotypes from the Intermediate population and 96 genotypes from the Premature population. In 2006, these populations (totaling 3570 and 2880 trees, respectively) were established in the field using a randomized complete block design (RCBD) with three replications. Each experimental plot consisted of five clonal plants. The trials were conducted at two locations to capture environmental variability: Marilândia Experimental Farm (FEM; 19°24′ S, 40°31′ W, 70 m altitude) and Sooretama Experimental Farm (FES; 15°47′ S, 43°18′ W, 40 m altitude). Standard agronomic practices, including fertilization and pest management, were applied uniformly across all plots and years to minimize environmental noise.
Phenotypic data were collected over four consecutive harvest years (2008–2011) for three traits: production of coffee beans (mature coffee fruit in the “cherries” stage, in 60 kg bags per hectare), yield of green beans post-harvest (ripened beans, in g, after processing), and natural infection of coffee leaf rust caused by H. vastatrix. Genotyping was performed using Genotyping-by-Sequencing (GBS), resulting in 45,748 SNPs for the Intermediate population and 59,332 SNPs for the Premature population after quality control. Further details on population development can be found in Ferrão et al. (2019) [13].

2.3. Phenotypic and Genotypic Data Acquisition

Phenotypic data were collected over four consecutive harvest years (2008–2011) [13]. Three key agronomic traits were evaluated:
  • Production of coffee beans: Measured as the volume of mature fruit (“cherries”) harvested, expressed in 60 kg bags per hectare.
  • Green bean yield: Measured as the weight (g) of processed, dried beans relative to the fresh harvest weight.
  • Leaf rust incidence: Assessed visually using a 1–9 scale based on sporulation intensity, where 1 indicates absence of symptoms (resistant) and 9 indicates severe sporulation (highly susceptible). Scoring was performed during the period of high natural infection pressure to maximize discrimination between genotypes.
To ensure the validity of the maturation groups, fruit ripening was monitored throughout the study. The Premature population consistently reached harvest maturity approximately one month earlier than the Intermediate population across all four years and both locations, confirming the distinct phenotypic behavior of these groups. Standard agronomic practices, including pruning, fertilization, and pest control, were applied uniformly across all plots to minimize environmental variability.
Genotyping was performed using Genotyping-by-Sequencing (GBS). After rigorous quality control (filtering for triallelic SNPs, Minor Allele Frequency < 1%, and call rates < 70%), the final dataset comprised 45,748 SNPs for the Intermediate population and 59,332 SNPs for the Premature population.

2.4. Single-SNP Association Analysis

Phenotypic inputs correspond to adjusted means (BLUPs) from the mixed model described by Ferrão et al. [13], which accounts for block and year effects and filters obvious outliers. Using these adjusted means allowed us to focus the genomic analyses on the genetic signal while minimizing environmental noise. We performed a response screening analysis in JMP Pro 17 (SAS Institute Inc., Cary, NC, USA) [18], using the platform’s default settings, to identify individual SNPs associated with each trait. For each population (Premature and Intermediate) and trait combination, the adjusted phenotypic value was used as the response variable, and all SNPs were included as predictor variables. A series of individual linear regressions was performed for each SNP, and a p-value for the association was calculated. This approach tests the additive effect of each SNP individually, assuming a linear relationship between the number of reference alleles and the trait value. To control for the high probability of false positives inherent in testing tens of thousands of SNPs, we applied a False Discovery Rate (FDR) correction to the p-values. Given the low genetic differentiation previously reported between these populations (FST = 0.0158) [13], complex population structure correction was deemed unnecessary for this initial screening, relying instead on the robustness of the adjusted phenotypic means. We chose a stringent FDR-adjusted significance threshold of 0.01 to identify a robust set of high-confidence SNP-trait associations.

2.5. Machine Learning for SNP Importance Analysis

2.5.1. Model Rationale and Implementation

To complement the single-SNP linear regressions, which test each marker individually, we employed a machine learning approach to identify important loci by considering all SNPs simultaneously. Based on a preliminary model screening using JMP Pro 17 (SAS Institute Inc., Cary, NC, USA), the Bootstrap Forest algorithm was selected for all subsequent analyses as it consistently provided the best explanatory performance across the different traits. The Bootstrap Forest algorithm, a robust ensemble method well-suited for high-dimensional genomic data, can capture complex, non-linear, and interactive effects that may be missed by single-marker models [19,20]. Critically, our objective for using this model was not for developing a predictive tool, but for variable importance ranking, a method to identify the SNPs with the greatest explanatory power. Given that the goal was explanatory and not predictive, potential model overfitting for performance on a new dataset was not the primary concern. Instead, the model was used to generate a comprehensive and complementary set of candidate genes for our primary objective: the downstream biological pathway analysis. Regarding Linkage Disequilibrium (LD), the Random Forest algorithm is inherently robust to correlated predictors. While LD can split importance scores among correlated SNPs, our goal was to identify genomic regions of interest rather than single causal variants, making this “grouping effect” advantageous for pathway analysis.

2.5.2. Model Parameters and Variable Importance

To ensure reproducibility and transparency, the Bootstrap Forest models were built using the following fixed parameters: Number of Trees = 100; Bootstrap Sample Rate = 1; Minimum Splits Per Tree = 10; Maximum Splits Per Tree = 2000; Minimum Size Split = 5; and a fixed random seed for reproducibility. The Number of Terms Sampled Per Split was adjusted based on the Number of Terms Sampled per Split to 30,333 for the Premature population and 23,086 for the Intermediate population (corresponding to approximately 50% of the total SNPs), to prioritize locus discovery over predictive accuracy. Variable importance for each SNP was then quantified using the “Portion” statistic, which represents the relative contribution of each SNP to the model. The top five SNPs with the highest “Portion” values for each trait and population were identified for further analysis.

2.6. Candidate Gene Identification and Gene Ontology Enrichment Analysis

To build a comprehensive list of candidate genes for pathway analysis, we leveraged the significant loci identified from our two complementary analytical frameworks: the statistically significant SNPs from the single-SNP association analysis and the top-ranked SNPs from the Bootstrap Forest importance analysis. For each significant SNP identified by the single-SNP association analysis (FDR-adjusted p-value < 0.01) and for the top five most important SNPs from the Bootstrap Forest analysis, we located the nearest annotated gene. This was performed using the C. canephora DH200-94 reference genome via Coffee Genome Hub (https://coffee-genome-hub.southgreen.fr/coffea_canephora (accessed on 21 February 2025)) [21,22]. For each identified gene, we recorded its gene ID, putative function, and the distance (in base pairs) from the associated SNP.
To investigate the broader biological pathways underlying the genetic architecture of these traits, we conducted a GO enrichment analysis. For this, a list of candidate genes was generated from the loci of the most important SNPs identified by the Bootstrap Forest models. To capture a robust biological signal for each trait-population combination, we selected the genes associated with the top 100 most important SNPs. This threshold was chosen to capture the highly predictive loci located at the upper tail of the variable importance distribution, ensuring a strong biological signal while excluding low-importance markers that represent background noise. The enrichment analysis was performed using ShinyGO v0.88 [23], which employs hierarchical clustering to reduce functional redundancy among GO terms. The analysis utilized the Coffea canephora gene database (AUK_PRJEB4211_v1), applying a stringent FDR cutoff of 0.05 to control for multiple testing and a pathway size filter to include GO terms containing between 2 and 5000 genes. The complete results of this enrichment analysis for all trait and population combinations are available in Supplementary Data S1.

2.7. Reproducibility and Software Specifications

All statistical analyses, including Single-SNP response screening and Bootstrap Forest modeling, were performed using JMP Pro 17 (SAS Institute Inc., Cary, NC, USA). To ensure reproducibility, the Bootstrap Forest models were initialized with a fixed random seed (Seed = 1). The genomic and phenotypic datasets utilized in this study are archived and accessible via Dryad (doi:10.5061/dryad.1139fm7, accessed on 21 February 2025). The C. canephora reference genome (DH200-94) and gene annotation database (AUK_PRJEB4211_v1) were accessed via the Coffee Genome Hub.

3. Results

3.1. Single-SNP Association Analysis of Agronomic Traits

The primary objective of this study was to compare the genetic architecture of two distinct breeding groups. Consequently, the results are presented to highlight the contrasts between the Premature and Intermediate populations across three complementary analytical layers: single-SNP associations, machine learning-based variable importance, and pathway-level biological enrichment.
This striking disparity in the number of significant associations (thousands in Premature vs. nearly none in Intermediate) implies fundamentally different genetic architectures: the Premature population exhibits an oligogenic structure with detectable major-effect loci, whereas the Intermediate population displays a highly polygenic distribution where individual SNP effects fall below the stringency threshold.
To identify SNPs associated with key agronomic traits in C. canephora, we performed a response screening analysis examining three traits (coffee bean production, leaf rust incidence, and green bean yield) in two populations (Premature and Intermediate). We applied an FDR-adjusted p-value threshold of 0.01 to identify significant SNP-trait associations.
The single-SNP analysis revealed a stark contrast in the genetic architecture between the two populations (Figure 1). No significant SNP associations were found for the production of coffee beans or the yield of green beans in the Intermediate population (FDR > 0.01). However, for leaf rust incidence in this population, a total of 23 significant SNPs were identified (17 with negative effects and 6 with positive effects) (Figure 1b). In sharp contrast, the Premature population showed thousands of significant associations across all traits. Specifically, we identified 1020 significant SNPs for the production of coffee beans (Figure 1d), 7100 SNPs for leaf rust incidence (Figure 1e), and 1850 SNPs for the yield of green beans (Figure 1f).
To visualize the genomic distribution of significant associations, we generated Manhattan plots showing the R-squared values for each significant SNP across the 11 chromosomes of C. canephora (Figure 2). For the production of coffee beans in the Premature population, a prominent peak was observed on chromosome 6 (Figure 2a). Significant SNPs were distributed across multiple chromosomes for leaf rust incidence in the Premature population, with particularly strong peaks on chromosomes 5 and 7 (Figure 2b). Significant SNPs were found on chromosomes 2, 5, 9, and 11 for the yield of green beans in the Premature population (Figure 2c). Leaf rust incidence in the Intermediate population exhibited significant associations on chromosomes 1, 2, 5, and 10 (Figure 2d).
Table 1 presents the candidate genes closest to the significant SNPs identified for each population-trait combination. Several of these genes have plausible connections to the traits under investigation. Among the significant SNPs for leaf rust incidence, several genes with known roles in plant defense were identified. In the Premature population, these included a putative disease resistance RPP13-like protein (Cc04t14270.1), an NB-ARC domain-containing protein (Cc03t09860.1), a peroxidase (Cc02t30380.1), and a chitin elicitor receptor kinase 1 (CERK1; Cc05t03340.1) (Table 1). RPP13-like and NB-ARC domain-containing proteins are often involved in pathogen recognition and downstream defense signaling. Peroxidases strengthen cell walls and produce reactive oxygen species during defense responses. CERK1 is a key receptor for chitin, a major component of fungal cell walls, triggering immune responses upon pathogen detection. In the Intermediate population, significant SNPs for leaf rust incidence were located near genes encoding a C2H2-type domain-containing protein (Cc10t04730.1), a WRKY domain-containing protein (Cc10t04810.1), a putative late blight resistance protein homolog R1B-16 (Cc01t08110.1), and a RING-type domain-containing protein (Cc11t03510.1) (Table 1). WRKY transcription factors are known to regulate plant defense responses [24,25], and RING-type proteins often function as E3 ubiquitin ligases, potentially involved in regulating defense signaling [26].
For the yield of green beans in the Premature population, a notable candidate gene was a putative caffeine synthase 3 (Cc09t06990.1), which suggests a potential linkage between caffeine metabolism and bean characteristics that may influence overall yield. Other candidates across the traits and populations (Table 1) included involvement in diverse cellular processes (protein ubiquitination, signal transduction, and cell wall modification).

3.2. Bootstrap Forest Analysis of SNP Importance for Agronomic Traits

We analyzed variable importance from Bootstrap Forest models to further investigate the genetic architecture of the three agronomic traits and identify SNPs with the most significant influence on phenotype prediction. Figure 3 and Figure 4 present Manhattan plots showing the importance score (“Portion”) for each SNP across the 11 chromosomes of C. canephora in the Premature and Intermediate populations, respectively. For the production of coffee beans in the Premature population (Figure 3a), the most important SNPs were concentrated on chromosome 6, suggesting a region of significant influence on this trait. Among the top five candidate genes in this region were those encoding an alpha/beta-hydrolases superfamily protein (Cc06t06270.1) and an IPPc domain-containing protein (Cc06t03050.1) (Table 2). For leaf rust incidence (Figure 3b), several chromosomes exhibited SNPs with high importance scores, particularly chromosomes 5, 7, and 8. The top five candidate genes for this trait included an acyl-coenzyme A oxidase (Cc07t15410.1) and a hydroxyproline-rich glycoprotein family protein (Cc08t07800.1) (Table 2). For the yield of green beans (Figure 3c), chromosome 11 showed a prominent SNP importance, with other important SNPs distributed across several chromosomes. A gene encoding a TORTIFOLIA1-like protein 4 (Cc11t13960.1) was among the top five candidates for this trait (Table 2).
In the Intermediate population, the patterns of SNP importance differed from those observed in the Premature population (Figure 4). For the production of coffee beans (Figure 4a), while some SNPs on chromosomes 4 and 7 showed high importance, the overall importance scores were lower compared to the Premature population. Top candidate genes included a non-specific phospholipase C6 (Cc04t10310.1) and a Smr domain-containing protein (Cc07t19900.1) (Table 3). For leaf rust incidence (Figure 4b), chromosomes 5 and 10 exhibited prominent peaks, with a TPR_REGION domain-containing protein (Cc05t15840.1) and a nitrate regulatory gene2 protein (Cc02t25100.1) among the top candidates (Table 3). Only a few SNPs stood out for the yield of green beans, one of which was located on chromosome 4 (Cc04t00320.1; Conserved hypothetical protein) (Figure 4c; Table 3).
Table 2 and Table 3 list the top five candidate genes for each trait in the Premature and Intermediate populations, respectively, ranked by their importance score in the Bootstrap Forest models. It is noteworthy that some genes (Cc06t03050.1: IPPc domain-containing protein, Cc05t02930.1: TAF domain-containing protein, Cc11t13960.1: TORTIFOLIA1-like protein 4, Cc02t25100.1: Nitrate regulatory gene 2, and Cc05t15840.1: TPR_REGION domain-containing protein) were commonly shown as top candidates by the Bootstrap Forest models and the single-SNP analysis (see Table 1).
Collectively, the top predictors in the Premature population cluster around specific metabolic regulation (e.g., lipid and caffeine pathways), whereas the Intermediate population’s top hits are functionally diverse, involving signaling and structural components, consistent with a broader polygenic base.

3.3. Comparative Analysis and Identification of Consensus Candidate Genes

To further elucidate the biological functions underlying the distinct genetic architectures of the Premature and Intermediate populations, a GO enrichment analysis was conducted. The analysis focused on genes associated with the top 100 predictive SNPs identified by the Bootstrap Forest models for each population, assessing both individual and combined effects.
The results revealed highly distinct biological signatures for each population. In the Premature population, yield-related traits were significantly enriched for specialized cellular functions (Figure 5a,b). The analysis combining all traits highlighted ‘lipid modification’ (GO:0030258) and pathways related to the internal space of organelles, such as ‘membrane-enclosed lumen’ (GO:0031974) and ‘organelle lumen’ (GO:0043233) (Figure 5c).
In contrast, the analysis of the Intermediate population showed a strong and consistent enrichment for pathways involved in regulating the actin cytoskeleton across all trait combinations. Key enriched terms included ‘reg. of actin filament depolymerization’ (GO:0030834), ‘negative regulation of actin filament polymerization’ (GO:0030837), and ‘actin filament capping’ (GO:0051693) (Figure 6). Furthermore, combining yield traits introduced significant enrichment for ‘response to salicylic acid’ (GO:0009751) (Figure 6b), and the further addition of rust resistance highlighted a broader ‘regulation of defense response’ (GO:0031347) pathway (Figure 6c)
To identify overarching biological themes shared between the two populations, a combined GO analysis was performed by pooling the gene lists. Notably, while neither population showed significant GO term enrichment for rust resistance individually, the combined analysis revealed a strong and statistically significant enrichment for ‘chitinase activity’(GO:0004568), ‘chitin metabolic proc.’ (GO:0006030 & 0006032), and ‘programmed cell death’ (GO:0034050) (Figure 7a). This indicates a shared, high-level defense function that becomes apparent only when data from both populations are pooled. For yield-related traits, the combined analysis was predominantly characterized by pathways involved in actin filament regulation (Figure 7b,c), demonstrating that this strong theme from the Intermediate population defines the combined genetic signal. Interestingly, when all traits from both populations were combined, the enrichment profile shifted to highlight different pathways involved in phosphatidylinositol metabolism and sulfate transmembrane transport (Figure 7d). In summary, these contrasting enrichment profiles demonstrate that while the Premature population relies on specific metabolic adaptations (lipid/lumen), the Intermediate population achieves similar agronomic outcomes through the modulation of fundamental cellular mechanics (actin/signaling).

4. Discussion

Our study provides a new layer of biological interpretation that complements and expands upon previous work that established the potential for genomic prediction [13] and dissected trait stability [27] in these important C. canephora populations. Whereas prior analyses focused on the statistical accuracy of predictive models or the predictability of stability metrics across environments, our primary objective was to dissect the functional basis of the underlying genetic architecture. By integrating single-SNP regression, machine learning for variable importance, and a novel comparative GO framework, our work moves beyond statistical observation to mechanistic insight. The central finding of this study, that the Premature and Intermediate populations employ fundamentally different biological strategies to achieve agronomic performance, provides a new biological context for breeding efforts that was previously absent from the literature on these populations. These contrasts indicate that population-aware interpretation is essential: the same trait class can be underpinned by specialized metabolism in one breeding population and by core cellular machinery in another. The dramatic difference in SNP discovery rates suggests that the Premature population is governed by an oligogenic architecture, likely driven by strong selection pressure on its specialized metabolic traits (e.g., lipid modification) [15,16]. In contrast, the Intermediate population follows an infinitesimal model (polygenic), where phenotypic variance is controlled by many small-effect loci that collectively drive the “Core Cellular Machinery” but individually lack the statistical power to be detected by single-SNP methods. This explains why the Machine Learning approach (which captures cumulative small effects) was essential for characterizing the Intermediate population. This distinction dictates tailored breeding strategies: the Premature population, with its oligogenic architecture, is a prime candidate for Marker-Assisted Selection (MAS) targeting specific high-effect loci. Conversely, the Intermediate population, governed by a polygenic background, would benefit most from GS models capable of capturing genome-wide small effects that escape traditional single-marker detection [28,29].
Our analysis confirms that the Premature and Intermediate populations possess distinct genomic architectures. This difference is likely attributable to their unique selection histories, which can alter allele frequencies and narrow the genetic base over time [15,16,17]. Additionally, potential differences in linkage disequilibrium patterns [30,31,32], which have been empirically shown to differ between populations based on breeding history [31,32,33,34], and founder effects from their establishment, which are known to shape the genetic makeup of breeding populations [35], may persist and contribute to these population-specific genetic effects [36]. While our single-SNP analysis highlighted a greater number of significant associations in the Premature population, our new pathway-level analysis provides a deeper biological explanation for these differences, moving beyond statistical observation to mechanistic interpretation.
The GO analysis revealed that the two populations employ fundamentally different biological strategies to achieve their agronomic traits. In the Premature population, yield-related traits were significantly enriched for specialized cellular pathways, including ‘lipid modification’ and pathways related to the ‘membrane-enclosed lumen’ of organelles. This is biologically significant, as the vacuole is the primary site for sugar and metabolite storage, and its proper function is crucial for determining overall plant yield [37,38]. Indeed, direct manipulation of vacuolar transporters has been shown to increase seed yield in model plants [39]. This pathway-level finding strongly supports our identification of a putative caffeine synthase 3 as a key candidate gene for green bean yield, as the biosynthesis and accumulation of purine alkaloids like caffeine is a key feature of seed and fruit development in Coffea [40,41,42,43]. In stark contrast, the genetic architecture of the Intermediate population was defined by the GO enrichment of pathways related to ‘actin cytoskeleton regulation’. The actin cytoskeleton is fundamental for coordinating all aspects of plant growth, including cell expansion, intracellular transport, and the maintenance of structural integrity [44]; therefore, its enrichment suggests that genetic variation in the Intermediate population influences yield through the overall efficiency of core cellular machinery. This aligns with our identification of candidate genes such as NPC6, a non-specific phospholipase C known to be integral to lipid metabolism and seed oil production [45,46,47], and TPR_REGION domain-containing proteins, which function as core scaffolds for protein–protein interactions within essential cellular complexes involved in protein transport and hormone-mediated stress responses [48,49,50,51,52]. Taken together, the enrichment of organelle lumen and lipid modification pathways in the Premature population and actin-cytoskeleton regulation in the Intermediate population point to distinct routes that may converge on sink strength and carbohydrate partitioning.
A similar divergence in strategy was observed for leaf rust resistance. Our GO analysis of the Intermediate population highlighted the ‘salicylic acid signaling’ pathway, linking its genetic architecture to a known hormonal defense response network that is critical for systemic acquired resistance against biotrophic pathogens [53,54,55,56]. This provides a biological context for candidate genes like the nitrate regulatory gene 2, suggesting a connection between nutrient status and defense signaling in this population [57]. In contrast, the Premature population’s resistance was associated with classical defense genes like RPP13-like protein [58,59,60,61], the NB-ARC domain, which functions as an essential signaling switch in NLR immune receptors [62,63,64,65], and CERK1, a pattern-recognition receptor that is essential for perceiving fungal chitin to trigger PAMP-triggered immunity [66,67,68,69]. The power of a comparative approach was most evident when data from both populations were pooled; only then did a shared, overarching ‘defense response’ theme become statistically significant. This demonstrates that while the specific genetic components differ, they contribute to a common biological function for rust resistance across a broader genetic background. Furthermore, the final combined analysis highlighted pathways of ‘phosphatidylinositol metabolism’ and ‘sulfate transport’ (Figure 7d), suggesting that fundamental membrane signaling and nutrient transport may represent a core biological system underpinning the general fitness and performance across all measured traits. Notably, a chitin/chitinase–programmed cell death module only reached significance when gene lists were pooled across populations, suggesting a convergent downstream immunity axis that single-population analyses may miss. This helps reconcile the Premature population’s CERK1/NLR emphasis with the Intermediate population’s salicylic acid–signaling enrichment.
Our results both complement and expand upon the recent polygenic GWAS analysis by Ferrão et al. [14]. While their Bayesian Sparse Linear Mixed Model (BSLMM) identified major QTL for traits like leaf blight and plant architecture, our use of complementary methods and a population-specific focus provides deeper biological context. Where our findings converge on similar genomic regions, our pathway-level analysis offers a mechanistic hypothesis for why these regions are important, a strategy that has proven effective for revealing molecular mechanisms when integrating GWAS with other functional data [69,70,71]. The novel identification of distinct, pathway-level strategies (specialized metabolism vs. core cellular machinery) is a key contribution that builds upon these previous genomic studies by explaining the biological nature of the genetic variation (Figure 8). Our work also aligns with other studies confirming the complex, polygenic nature of traits like rust resistance in perennial crops such as C. canephora [72,73,74,75,76,77,78].
This study is not without limitations. The machine learning models were developed without an independent validation dataset due to sample size constraints, meaning the results demonstrate explanatory power rather than confirmed predictive ability. This is a critical consideration, as robust cross-validation is essential for accurately assessing and comparing the performance of genomic models [79,80,81]. Furthermore, we can only demonstrate statistical associations between SNPs and traits, and not definitively prove causation, a common challenge in moving from GWAS loci to causal genes [82,83]. The candidate genes we’ve identified are promising targets, but functional validation is essential to confirm their roles, a critical next step for all discovery-based genomic studies [78,83,84,85,86].
The findings from our GO analysis, however, provide clear, targeted avenues for future work. Functional validation should now prioritize not only single candidate genes like caffeine synthase 3 in the Premature population, but also key regulators within the actin filament polymerization and salicylic acid signaling pathways now identified in the Intermediate population [84]. A deeper investigation of genotype-by-environment interactions, a well-established challenge in coffee breeding, is also crucial [9,13]. By integrating complementary analytical approaches with pathway-level analysis, this work provides a more nuanced, population-specific understanding of the genetic basis of key coffee traits and paves the way for more efficient and targeted coffee breeding programs.

5. Conclusions

This study demonstrates that C. canephora breeding populations can diverge not only in allele frequencies but in the fundamental biological strategies driving agronomic performance. By integrating multi-layered genomic analyses, we identified that the Premature population leverages specialized metabolic pathways (lipid/organelle lumen) via an oligogenic architecture, whereas the Intermediate population relies on core cellular machinery (actin/signaling) through a polygenic framework. These findings challenge the “one-size-fits-all” approach to molecular breeding, suggesting that breeding strategies must be population-specific: Marker-Assisted Selection for the metabolically driven Premature group and GS for the polygenic Intermediate group. A key limitation of this study is the reliance on data from specific environments; future work must validate these pathway-level mechanisms across diverse agro-ecological zones to confirm their stability and broader applicability.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/plants14233675/s1, Supplementary Data S1: The complete GO enrichment results generated and analyzed during the current study. The raw phenotype and SNP data used in this study are available in [13] (https://doi.org/10.5061/dryad.1139fm7).

Author Contributions

Conceptualization, E.A.; methodology, E.A.; software, E.A., S.P., J.B. and S.L.; validation, S.P., J.B. and S.L.; formal analysis, E.A., S.P., J.B. and S.L.; investigation, S.P., J.B. and S.L.; data curation, S.P. and J.B.; writing—original draft preparation, E.A.; writing—review and editing, E.A., S.P., J.B., S.L. and L.W.M.; supervision, E.A.; project administration, E.A. and L.W.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the U.S. Department of Agriculture, Agricultural Research Service, In-House Projects No. 8042-21220-258-000-D and 8042-21000-303-000-D.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: Ferrão et al. [13] (https://doi.org/10.5061/dryad.1139fm7). Additional processed data are contained within the article and Supplementary Materials.

Acknowledgments

We are also grateful to the reviewers for their constructive feedback. Mention of any trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U. S. Department of Agriculture. USDA is an equal opportunity provider and employer, and all agency services are available without discrimination.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lashermes, P.; Andrade, A.C.; Etienne, H. Genomics of Coffee One of the World’s Largest Traded Commodities. Genom. Trop. Crop Plants 2008, 1, 203–226. [Google Scholar]
  2. Millet, C.P.; Delahaie, B.; Georget, F.; Allinne, C.; Solano-Sánchez, W.; Zhang, D.; Jeune, W.; Toniutti, L.; Poncet, V. Guadeloupe and Haiti’s Coffee Genetic Resources Reflect the Crop’s Regional and Global History. Plants People Planet 2025, 7, 245–262. [Google Scholar] [CrossRef]
  3. Campuzano-Duque, L.F.; Herrera, J.C.; Ged, C.; Blair, M.W. Bases for the Establishment of Robusta Coffee (Coffea canephora) as a New Crop for Colombia. Agronomy 2021, 11, 2550. [Google Scholar] [CrossRef]
  4. Capucho, A.; Zambolim, L.; Lopes, U.; Milagres, N. Chemical Control of Coffee Leaf Rust in Coffea canephora Cv. Conilon. Australas. Plant Pathol. 2013, 42, 667–673. [Google Scholar] [CrossRef]
  5. Mishra, M.K. Genetic Resources and Breeding of Coffee (Coffea spp.). In Advances in Plant Breeding Strategies: Nut and Beverage Crops: Volume 4; Springer: Berlin/Heidelberg, Germany, 2020; pp. 475–515. [Google Scholar]
  6. Merrick, L.F.; Herr, A.W.; Sandhu, K.S.; Lozada, D.N.; Carter, A.H. Optimizing Plant Breeding Programs for Genomic Selection. Agronomy 2022, 12, 714. [Google Scholar] [CrossRef]
  7. Chaves, S.F.; Dias, L.A.; Alves, R.S.; Ferreira, F.M.; Araújo, M.S.; Resende, M.D.; Takahashi, E.K.; Souza, J.E.; Leite, F.P.; Fernandes, S.B. Realized Genetic Gain with Reciprocal Recurrent Selection in a Eucalyptus Breeding Program. Tree Genet. Genomes 2024, 20, 47. [Google Scholar] [CrossRef]
  8. Vieira, R.A.; Nogueira, A.P.O.; Fritsche-Neto, R. Optimizing the Selection of Quantitative Traits in Plant Breeding Using Simulation. Front. Plant Sci. 2025, 16, 1495662. [Google Scholar] [CrossRef]
  9. Alemu, A.; Åstrand, J.; Montesinos-Lopez, O.A.; Y Sanchez, J.I.; Fernandez-Gonzalez, J.; Tadesse, W.; Vetukuri, R.R.; Carlsson, A.S.; Ceplitis, A.; Crossa, J. Genomic Selection in Plant Breeding: Key Factors Shaping Two Decades of Progress. Mol. Plant 2024, 17, 552–578. [Google Scholar] [CrossRef]
  10. Cabrera-Bosquet, L.; Crossa, J.; von Zitzewitz, J.; Serret, M.D.; Luis Araus, J. High-throughput Phenotyping and Genomic Selection: The Frontiers of Crop Breeding Converge F. J. Integr. Plant Biol. 2012, 54, 312–320. [Google Scholar] [CrossRef]
  11. Alkimim, E.R.; Caixeta, E.T.; Sousa, T.V.; Resende, M.D.V.; da Silva, F.L.; Sakiyama, N.S.; Zambolim, L. Selective Efficiency of Genome-Wide Selection in Coffea canephora Breeding. Tree Genet. Genomes 2020, 16, 41. [Google Scholar] [CrossRef]
  12. Paixão, P.T.M.; Nascimento, A.C.C.; Nascimento, M.; Azevedo, C.F.; Oliveira, G.F.; da Silva, F.L.; Caixeta, E.T. Factor Analysis Applied in Genomic Selection Studies in the Breeding of Coffea canephora. Euphytica 2022, 218, 42. [Google Scholar] [CrossRef] [PubMed]
  13. Ferrão, L.F.V.; Ferrão, R.G.; Ferrão, M.A.G.; Fonseca, A.; Carbonetto, P.; Stephens, M.; Garcia, A.A.F. Accurate Genomic Prediction of Coffea canephora in Multiple Environments Using Whole-Genome Statistical Models. Heredity 2019, 122, 261–275. [Google Scholar] [CrossRef]
  14. Ferrão, M.A.G.; Da Fonseca, A.F.; Volpi, P.S.; de Souza, L.C.; Comério, M.; Filho, A.C.V.; Riva-Souza, E.M.; Munoz, P.R.; Ferrão, R.G.; Ferrão, L.F.V. Genomic-assisted Breeding for Climate-smart Coffee. Plant Genome 2024, 17, e20321. [Google Scholar] [CrossRef]
  15. Morgante, M.; Salamini, F. From Plant Genomics to Breeding Practice. Curr. Opin. Biotechnol. 2003, 14, 214–219. [Google Scholar] [CrossRef]
  16. Doebley, J.F.; Gaut, B.S.; Smith, B.D. The Molecular Genetics of Crop Domestication. Cell 2006, 127, 1309–1321. [Google Scholar] [CrossRef]
  17. Sun, L.; Lai, M.; Ghouri, F.; Nawaz, M.A.; Ali, F.; Baloch, F.S.; Nadeem, M.A.; Aasim, M.; Shahid, M.Q. Modern Plant Breeding Techniques in Crop Improvement and Genetic Diversity: From Molecular Markers and Gene Editing to Artificial Intelligence—A Critical Review. Plants 2024, 13, 2676. [Google Scholar] [CrossRef]
  18. Klimberg, R. Fundamentals of Predictive Analytics with JMP; SAS Institute: Cary, NC, USA, 2023; ISBN 1-68580-001-7. [Google Scholar]
  19. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  20. Basu, S.; Kumbier, K.; Brown, J.B.; Yu, B. Iterative Random Forests to Discover Predictive and Stable High-Order Interactions. Proc. Natl. Acad. Sci. USA 2018, 115, 1943–1948. [Google Scholar] [CrossRef]
  21. Dereeper, A.; Bocs, S.; Rouard, M.; Guignon, V.; Ravel, S.; Tranchant-Dubreuil, C.; Poncet, V.; Garsmeur, O.; Lashermes, P.; Droc, G. The Coffee Genome Hub: A Resource for Coffee Genomes. Nucleic Acids Res. 2015, 43, D1028–D1035. [Google Scholar] [CrossRef]
  22. Denoeud, F.; Carretero-Paulet, L.; Dereeper, A.; Droc, G.; Guyot, R.; Pietrella, M.; Zheng, C.; Alberti, A.; Anthony, F.; Aprea, G. The Coffee Genome Provides Insight into the Convergent Evolution of Caffeine Biosynthesis. Science 2014, 345, 1181–1184. [Google Scholar] [CrossRef]
  23. Ge, S.X.; Jung, D.; Yao, R. ShinyGO: A Graphical Gene-Set Enrichment Tool for Animals and Plants. Bioinformatics 2020, 36, 2628–2629. [Google Scholar] [CrossRef]
  24. Jiang, J.; Ma, S.; Ye, N.; Jiang, M.; Cao, J.; Zhang, J. WRKY Transcription Factors in Plant Responses to Stresses. J. Integr. Plant Biol. 2017, 59, 86–101. [Google Scholar] [CrossRef] [PubMed]
  25. Wani, S.H.; Anand, S.; Singh, B.; Bohra, A.; Joshi, R. WRKY Transcription Factors and Plant Defense Responses: Latest Discoveries and Future Prospects. Plant Cell Rep. 2021, 40, 1071–1085. [Google Scholar] [CrossRef] [PubMed]
  26. Duplan, V.; Rivas, S. E3 Ubiquitin-Ligases and Their Target Proteins during the Regulation of Plant Innate Immunity. Front. Plant Sci. 2014, 5, 42. [Google Scholar] [CrossRef]
  27. Adunola, P.; Ferrão, M.A.G.; Ferrão, R.G.; Da Fonseca, A.F.; Volpi, P.S.; Comério, M.; Verdin Filho, A.C.; Munoz, P.R.; Ferrão, L.F.V. Genomic Selection for Genotype Performance and Environmental Stability in Coffea canephora. G3 Genes Genomes Genet. 2023, 13, jkad062. [Google Scholar] [CrossRef]
  28. Bernardo, R. Molecular Markers and Selection for Complex Traits in Plants: Learning from the Last 20 Years. Crop Sci. 2008, 48, 1649–1664. [Google Scholar] [CrossRef]
  29. Heffner, E.L.; Sorrells, M.E.; Jannink, J.-L. Genomic Selection for Crop Improvement. Crop Sci. 2009, 49, 1–12. [Google Scholar] [CrossRef]
  30. Flint-Garcia, S.A.; Thornsberry, J.M.; Buckler IV, E.S. Structure of Linkage Disequilibrium in Plants. Annu. Rev. Plant Biol. 2003, 54, 357–374. [Google Scholar] [CrossRef]
  31. Yu, J.; Buckler, E.S. Genetic Association Mapping and Genome Organization of Maize. Curr. Opin. Biotechnol. 2006, 17, 155–160. [Google Scholar] [CrossRef]
  32. Wright, S.I.; Ness, R.W.; Foxe, J.P.; Barrett, S.C. Genomic Consequences of Outcrossing and Selfing in Plants. Int. J. Plant Sci. 2008, 169, 105–118. [Google Scholar] [CrossRef]
  33. Casler, M. Agricultural Fitness of Smooth Bromegrass Populations Selected for Divergent Fiber Concentration. Crop Sci. 2005, 45, 36–43. [Google Scholar] [CrossRef]
  34. Roncallo, P.F.; Larsen, A.O.; Achilli, A.L.; Pierre, C.S.; Gallo, C.A.; Dreisigacker, S.; Echenique, V. Linkage Disequilibrium Patterns, Population Structure and Diversity Analysis in a Worldwide Durum Wheat Collection Including Argentinian Genotypes. BMC Genom. 2021, 22, 233. [Google Scholar] [CrossRef]
  35. Ladizinsky, G. Founder Effect in Crop-Plant Evolution. Econ. Bot. 1985, 39, 191–199. [Google Scholar] [CrossRef]
  36. Pelc, S.E.; Couillard, D.M.; Stansell, Z.J.; Farnham, M.W. Genetic Diversity and Population Structure of Collard Landraces and Their Relationship to Other Brassica Oleracea Crops. Plant Genome 2015, 8, plantgenome2015.04.0023. [Google Scholar] [CrossRef]
  37. Hedrich, R.; Sauer, N.; Neuhaus, H.E. Sugar Transport across the Plant Vacuolar Membrane: Nature and Regulation of Carrier Proteins. Curr. Opin. Plant Biol. 2015, 25, 63–70. [Google Scholar] [CrossRef]
  38. Takatsuka, H.; Higaki, T.; Ito, M. At the Nexus between Cytoskeleton and Vacuole: How Plant Cytoskeletons Govern the Dynamics of Large Vacuoles. Int. J. Mol. Sci. 2023, 24, 4143. [Google Scholar] [CrossRef]
  39. Wingenter, K.; Schulz, A.; Wormit, A.; Wic, S.; Trentmann, O.; Hoermiller, I.I.; Heyer, A.G.; Marten, I.; Hedrich, R.; Neuhaus, H.E. Increased Activity of the Vacuolar Monosaccharide Transporter TMT1 Alters Cellular Sugar Partitioning, Sugar Signaling, and Seed Yield in Arabidopsis. Plant Physiol. 2010, 154, 665–677. [Google Scholar] [CrossRef]
  40. Suzuki, T.; Ashihara, H.; Waller, G.R. Purine and Purine Alkaloid Metabolism in Camellia and Coffea Plants. Phytochemistry 1992, 31, 2575–2584. [Google Scholar] [CrossRef]
  41. Ashihara, H.; Sano, H.; Crozier, A. Caffeine and Related Purine Alkaloids: Biosynthesis, Catabolism, Function and Genetic Engineering. Phytochemistry 2008, 69, 841–856. [Google Scholar] [CrossRef]
  42. Kato, M.; Kitao, N.; Ishida, M.; Morimoto, H.; Irino, F.; Mizuno, K. Expression for Caffeine Biosynthesis and Related Enzymes in Camellia Sinensis. Z. Für Naturforschung C 2010, 65, 245–256. [Google Scholar] [CrossRef]
  43. Fu, X.; Li, G.; Hu, F.; Huang, J.; Lou, Y.; Li, Y.; Li, Y.; He, H.; Lv, Y.; Cheng, J. Comparative Transcriptome Analysis in Peaberry and Regular Bean Coffee to Identify Bean Quality Associated Genes. BMC Genom. Data 2023, 24, 12. [Google Scholar] [CrossRef]
  44. Wasteneys, G.O.; Galway, M.E. Remodeling the Cytoskeleton for Growth and Form: An Overview with Some New Views. Annu. Rev. Plant Biol. 2003, 54, 691–722. [Google Scholar] [CrossRef]
  45. Pokotylo, I.; Pejchar, P.; Potocký, M.; Kocourková, D.; Krčková, Z.; Ruelland, E.; Kravets, V.; Martinec, J. The Plant Non-Specific Phospholipase C Gene Family. Novel Competitors in Lipid Signalling. Prog. Lipid Res. 2013, 52, 62–79. [Google Scholar] [CrossRef]
  46. Cai, G.; Fan, C.; Liu, S.; Yang, Q.; Liu, D.; Wu, J.; Li, J.; Zhou, Y.; Guo, L.; Wang, X. Nonspecific Phospholipase C6 Increases Seed Oil Production in Oilseed Brassicaceae Plants. New Phytol. 2020, 226, 1055–1073. [Google Scholar] [CrossRef]
  47. Nakamura, Y.; Ngo, A.H. Non-Specific Phospholipase C (NPC): An Emerging Class of Phospholipase C in Plant Growth and Development. J. Plant Res. 2020, 133, 489–497. [Google Scholar] [CrossRef]
  48. Blatch, G.L.; Lässle, M. The Tetratricopeptide Repeat: A Structural Motif Mediating Protein-protein Interactions. BIOEEJ 1999, 21, 932–939. [Google Scholar] [CrossRef]
  49. Schapire, A.L.; Valpuesta, V.; Botella, M.A. TPR Proteins in Plant Hormone Signaling. Plant Signal. Behav. 2006, 1, 229–230. [Google Scholar] [CrossRef]
  50. Schlegel, T.; Mirus, O.; Von Haeseler, A.; Schleiff, E. The Tetratricopeptide Repeats of Receptors Involved in Protein Translocation across Membranes. Mol. Biol. Evol. 2007, 24, 2763–2774. [Google Scholar] [CrossRef]
  51. Zeytuni, N.; Zarivach, R. Structural and Functional Discussion of the Tetra-Trico-Peptide Repeat, a Protein Interaction Module. Structure 2012, 20, 397–405. [Google Scholar] [CrossRef]
  52. Zhang, T.; Meng, L.; Kong, W.; Yin, Z.; Wang, Y.; Schneider, J.D.; Chen, S. Quantitative Proteomics Reveals a Role of JAZ7 in Plant Defense Response to Pseudomonas Syringae DC3000. J. Proteom. 2018, 175, 114–126. [Google Scholar] [CrossRef]
  53. Gaffney, T.; Friedrich, L.; Vernooij, B.; Negrotto, D.; Nye, G.; Uknes, S.; Ward, E.; Kessmann, H.; Ryals, J. Requirement of Salicylic Acid for the Induction of Systemic Acquired Resistance. Science 1993, 261, 754–756. [Google Scholar] [CrossRef]
  54. Klessig, D.F.; Choi, H.W.; Dempsey, D.A. Systemic Acquired Resistance and Salicylic Acid: Past, Present, and Future. Mol. Plant. Microbe Interact. 2018, 31, 871–888. [Google Scholar] [CrossRef] [PubMed]
  55. Benjamin, G.; Pandharikar, G.; Frendo, P. Salicylic Acid in Plant Symbioses: Beyond Plant Pathogen Interactions. Biology 2022, 11, 861. [Google Scholar] [CrossRef]
  56. Shariatipour, N.; Yazdani, M.; Carlsson, A.; Bengtsson, T.; Kianian, S.F.; Jalli, M.; Rahmatov, M.; PPP RobOat Consortium. Genetic Dissection of Crown Rust Resistance in Oat and the Identification of Key Adult Plant Resistance Genes. Plant Genome 2025, 18, e70059. [Google Scholar] [CrossRef]
  57. Mu, X.; Luo, J. Evolutionary Analyses of NIN-like Proteins in Plants and Their Roles in Nitrate Signaling. Cell. Mol. Life Sci. 2019, 76, 3753–3764. [Google Scholar] [CrossRef]
  58. Bittner-Eddy, P.D.; Crute, I.R.; Holub, E.B.; Beynon, J.L. RPP13 Is a Simple Locus in Arabidopsis Thaliana for Alleles That Specify Downy Mildew Resistance to Different Avirulence Determinants in Peronospora Parasitica. Plant J. 2000, 21, 177–188. [Google Scholar] [CrossRef]
  59. Bachlava, E.; Radwan, O.E.; Abratti, G.; Tang, S.; Gao, W.; Heesacker, A.F.; Bazzalo, M.E.; Zambelli, A.; Leon, A.J.; Knapp, S.J. Downy Mildew (Pl 8 and Pl 14) and Rust (R Adv) Resistance Genes Reside in Close Proximity to Tandemly Duplicated Clusters of Non-TIR-like NBS-LRR-Encoding Genes on Sunflower Chromosomes 1 and 13. Theor. Appl. Genet. 2011, 122, 1211–1221. [Google Scholar] [CrossRef]
  60. Bish, M.D.; Ramachandran, S.R.; Wright, A.; Lincoln, L.M.; Whitham, S.A.; Graham, M.A.; Pedley, K.F. The Soybean Rpp3 Gene Encodes a TIR-NBS-LRR Protein That Confers Resistance to Phakopsora pachyrhizi. Mol. Plant. Microbe Interact. 2024, 37, 561–570. [Google Scholar] [CrossRef]
  61. Yuan, B.; Li, C.; Wang, Q.; Yao, Q.; Guo, X.; Zhang, Y.; Wang, Z. Identification and Functional Characterization of the RPP13 Gene Family in Potato (Solanum tuberosum L.) for Disease Resistance. Front. Plant Sci. 2025, 15, 1515060. [Google Scholar] [CrossRef]
  62. Sarris, P.F.; Cevik, V.; Dagdas, G.; Jones, J.D.; Krasileva, K.V. Comparative Analysis of Plant Immune Receptor Architectures Uncovers Host Proteins Likely Targeted by Pathogens. BMC Biol. 2016, 14, 8. [Google Scholar] [CrossRef]
  63. Chandra, S.; Kazmi, A.Z.; Ahmed, Z.; Roychowdhury, G.; Kumari, V.; Kumar, M.; Mukhopadhyay, K. Genome-Wide Identification and Characterization of NB-ARC Resistant Genes in Wheat (Triticum aestivum L.) and Their Expression during Leaf Rust Infection. Plant Cell Rep. 2017, 36, 1097–1112. [Google Scholar] [CrossRef]
  64. Dubey, N.; Singh, K. Role of NBS-LRR Proteins in Plant Defense. In Molecular Aspects of Plant-Pathogen Interaction; Springer: Singapore, 2018; pp. 115–138. [Google Scholar]
  65. Wang, J.; Chen, T.; Han, M.; Qian, L.; Li, J.; Wu, M.; Han, T.; Cao, J.; Nagalakshmi, U.; Rathjen, J.P. Plant NLR Immune Receptor Tm-22 Activation Requires NB-ARC Domain-Mediated Self-Association of CC Domain. PLoS Pathog. 2020, 16, e1008475. [Google Scholar] [CrossRef]
  66. Zipfel, C. Early Molecular Events in PAMP-Triggered Immunity. Curr. Opin. Plant Biol. 2009, 12, 414–420. [Google Scholar] [CrossRef]
  67. García, Y.H.; Zamora, O.R.; Troncoso-Rojas, R.; Tiznado-Hernández, M.E.; Báez-Flores, M.E.; Carvajal-Millan, E.; Rascón-Chu, A. Toward Understanding the Molecular Recognition of Fungal Chitin and Activation of the Plant Defense Mechanism in Horticultural Crops. Molecules 2021, 26, 6513. [Google Scholar] [CrossRef] [PubMed]
  68. Fan, A.; Wei, L.; Zhang, X.; Liu, J.; Sun, L.; Xiao, J.; Wang, Y.; Wang, H.; Hua, J.; Singh, R.P. Heterologous Expression of the Haynaldia villosa Pattern-Recognition Receptor CERK1-V in Wheat Increases Resistance to Three Fungal Diseases. Crop J. 2022, 10, 1733–1745. [Google Scholar] [CrossRef]
  69. Wang, L.; He, Y.; Guo, G.; Xia, X.; Dong, Y.; Zhang, Y.; Wang, Y.; Fan, X.; Wu, L.; Zhou, X. Overexpression of Plant Chitin Receptors in Wheat Confers Broad-spectrum Resistance to Fungal Diseases. Plant J. 2024, 120, 1047–1063. [Google Scholar] [CrossRef]
  70. Daryani, P.; Darzi Ramandi, H.; Dezhsetan, S.; Mirdar Mansuri, R.; Hosseini Salekdeh, G.; Shobbar, Z.-S. Pinpointing Genomic Regions Associated with Root System Architecture in Rice through an Integrative Meta-Analysis Approach. Theor. Appl. Genet. 2022, 135, 81–106. [Google Scholar] [CrossRef]
  71. Yin, X.; Bose, D.; Kwon, A.; Hanks, S.C.; Jackson, A.U.; Stringham, H.M.; Welch, R.; Oravilahti, A.; Silva, L.F.; Locke, A.E. Integrating Transcriptomics, Metabolomics, and GWAS Helps Reveal Molecular Mechanisms for Metabolite Levels and Disease Risk. Am. J. Hum. Genet. 2022, 109, 1727–1741. [Google Scholar] [CrossRef]
  72. DeHaan, L.R.; Van Tassel, D.L. Useful Insights from Evolutionary Biology for Developing Perennial Grain Crops. Am. J. Bot. 2014, 101, 1801–1819. [Google Scholar] [CrossRef]
  73. Álvarez, M.F.; Mosquera, T.; Blair, M.W. The Use of Association Genetics Approaches in Plant Breeding. Plant Breed. Rev. 2014, 38, 17–68. [Google Scholar]
  74. Silva, L.F. Estudo de Associação Genômica Ampla (GWAS) em Coffea canephora. Master’s Thesis, Universidade Federal de Viçosa, Viçosa, Brazil, 2018. [Google Scholar]
  75. de Faria Silva, L.; Alkimim, E.R.; Barreiro, P.R.R.M.; Leichtweis, B.G.; Silva, A.C.A.; da Silva, R.A.; Sousa, T.V.; Nascimento, M.; Caixeta, E.T. Genome-Wide Association Study of Plant Architecture and Diseases Resistance in Coffea canephora. Euphytica 2022, 218, 92. [Google Scholar] [CrossRef]
  76. Paape, T.; Heiniger, B.; Santo Domingo, M.; Clear, M.R.; Lucas, M.M.; Pueyo, J.J. Genome-Wide Association Study Reveals Complex Genetic Architecture of Cadmium and Mercury Accumulation and Tolerance Traits in Medicago truncatula. Front. Plant Sci. 2022, 12, 806949. [Google Scholar] [CrossRef] [PubMed]
  77. Altaf, M.T.; Liaqat, W.; Jamil, A.; Mohamed, H.I.; Fahad, M.; Jan, M.F.; Baloch, F.S. A Critical Review: Breeding Objectives, Genomic Resources, and Marker-Assisted Methods in Sorghum (Sorghum bicolor L.). J. Soil Sci. Plant Nutr. 2024, 24, 4597–4623. [Google Scholar] [CrossRef]
  78. Kumar, R.; Das, S.P.; Choudhury, B.U.; Kumar, A.; Prakash, N.R.; Verma, R.; Chakraborti, M.; Devi, A.G.; Bhattacharjee, B.; Das, R. Advances in Genomic Tools for Plant Breeding: Harnessing DNA Molecular Markers, Genomic Selection, and Genome Editing. Biol. Res. 2024, 57, 80. [Google Scholar] [CrossRef] [PubMed]
  79. Werner, C.R.; Gaynor, R.C.; Gorjanc, G.; Hickey, J.M.; Kox, T.; Abbadi, A.; Leckband, G.; Snowdon, R.J.; Stahl, A. How Population Structure Impacts Genomic Selection Accuracy in Cross-Validation: Implications for Practical Breeding. Front. Plant Sci. 2020, 11, 592977. [Google Scholar] [CrossRef]
  80. Dekkers, J.C.; Su, H.; Cheng, J. Predicting the Accuracy of Genomic Predictions. Genet. Sel. Evol. 2021, 53, 55. [Google Scholar] [CrossRef]
  81. Schrauf, M.F.; de Los Campos, G.; Munilla, S. Comparing Genomic Prediction Models by Means of Cross Validation. Front. Plant Sci. 2021, 12, 734512. [Google Scholar] [CrossRef] [PubMed]
  82. Schaefer, R.J.; Michno, J.-M.; Jeffers, J.; Hoekenga, O.; Dilkes, B.; Baxter, I.; Myers, C.L. Integrating Coexpression Networks with GWAS to Prioritize Causal Genes in Maize. Plant Cell 2018, 30, 2922–2942. [Google Scholar] [CrossRef]
  83. Cano-Gamez, E.; Trynka, G. From GWAS to Function: Using Functional Genomics to Identify the Mechanisms Underlying Complex Diseases. Front. Genet. 2020, 11, 424. [Google Scholar] [CrossRef]
  84. Thomson, M.J.; Biswas, S.; Tsakirpaloglou, N.; Septiningsih, E.M. Functional Allele Validation by Gene Editing to Leverage the Wealth of Genetic Resources for Crop Improvement. Int. J. Mol. Sci. 2022, 23, 6565. [Google Scholar] [CrossRef] [PubMed]
  85. Tsakirpaloglou, N.; Septiningsih, E.M.; Thomson, M.J. Guidelines for Performing CRISPR/Cas9 Genome Editing for Gene Validation and Trait Improvement in Crops. Plants 2023, 12, 3564. [Google Scholar] [CrossRef] [PubMed]
  86. Sahito, J.H.; Zhang, H.; Gishkori, Z.G.N.; Ma, C.; Wang, Z.; Ding, D.; Zhang, X.; Tang, J. Advancements and Prospects of Genome-Wide Association Studies (GWAS) in Maize. Int. J. Mol. Sci. 2024, 25, 1918. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Volcano plots of single-SNP association analysis for three agronomic traits in two C. canephora populations. Each point represents a single SNP. The x-axis shows the estimated slope (effect size) from a linear regression of the phenotype on the SNP genotype, and the y-axis shows the negative base-10 logarithm of the p-value from the association test. SNPs are colored based on the direction of their effect and statistical significance after applying an FDR correction to account for multiple testing. Red points indicate SNPs with a positive effect and FDR-adjusted p-value < 0.01; blue points indicate SNPs with a negative effect and FDR-adjusted p-value < 0.01; gray points do not meet the significance threshold. The plots highlight a substantially larger number of significant associations in the Premature population compared to the Intermediate population. (a) Production of coffee beans, Intermediate. (b) Leaf rust incidence, Intermediate. (c) Yield of green beans, Intermediate. (d) Production of coffee beans, Premature. (e) Leaf rust incidence, Premature. (f) Yield of green beans, Premature.
Figure 1. Volcano plots of single-SNP association analysis for three agronomic traits in two C. canephora populations. Each point represents a single SNP. The x-axis shows the estimated slope (effect size) from a linear regression of the phenotype on the SNP genotype, and the y-axis shows the negative base-10 logarithm of the p-value from the association test. SNPs are colored based on the direction of their effect and statistical significance after applying an FDR correction to account for multiple testing. Red points indicate SNPs with a positive effect and FDR-adjusted p-value < 0.01; blue points indicate SNPs with a negative effect and FDR-adjusted p-value < 0.01; gray points do not meet the significance threshold. The plots highlight a substantially larger number of significant associations in the Premature population compared to the Intermediate population. (a) Production of coffee beans, Intermediate. (b) Leaf rust incidence, Intermediate. (c) Yield of green beans, Intermediate. (d) Production of coffee beans, Premature. (e) Leaf rust incidence, Premature. (f) Yield of green beans, Premature.
Plants 14 03675 g001
Figure 2. Manhattan plots of single-SNP association analysis, showing only SNPs that reached genome-wide significance (FDR < 0.01). Each point represents a single SNP. The x-axis shows the chromosomal location (1 through 11). The y-axis shows the R-squared value, representing the proportion of phenotypic variance explained by each individual SNP. SNPs are colored based on the direction of their effect: red indicates a positive effect, and blue indicates a negative effect. Gray points represent SNPs that did not reach the significance threshold. Each panel reveals the genomic regions most strongly associated with the respective trait. (a) Production of coffee beans, Premature. (b) Leaf rust incidence, Premature. (c) Yield of green beans, Premature. (d) Leaf rust incidence, Intermediate. Plots for two trait-population combinations are not shown due to a lack of significant associations.
Figure 2. Manhattan plots of single-SNP association analysis, showing only SNPs that reached genome-wide significance (FDR < 0.01). Each point represents a single SNP. The x-axis shows the chromosomal location (1 through 11). The y-axis shows the R-squared value, representing the proportion of phenotypic variance explained by each individual SNP. SNPs are colored based on the direction of their effect: red indicates a positive effect, and blue indicates a negative effect. Gray points represent SNPs that did not reach the significance threshold. Each panel reveals the genomic regions most strongly associated with the respective trait. (a) Production of coffee beans, Premature. (b) Leaf rust incidence, Premature. (c) Yield of green beans, Premature. (d) Leaf rust incidence, Intermediate. Plots for two trait-population combinations are not shown due to a lack of significant associations.
Plants 14 03675 g002
Figure 3. Manhattan plots showing variable importance from Bootstrap Forest models for agronomic traits in the Premature population. The x-axis indicates the chromosomal position of each SNP. The y-axis represents the variable importance score (“Portion”), which quantifies the relative contribution of each SNP to the model’s explanatory power. Higher values indicate greater importance. Points are colored in alternating red and blue to distinguish between adjacent chromosomes. (a) Production of coffee beans. (b) Leaf rust incidence. (c) Yield of green beans.
Figure 3. Manhattan plots showing variable importance from Bootstrap Forest models for agronomic traits in the Premature population. The x-axis indicates the chromosomal position of each SNP. The y-axis represents the variable importance score (“Portion”), which quantifies the relative contribution of each SNP to the model’s explanatory power. Higher values indicate greater importance. Points are colored in alternating red and blue to distinguish between adjacent chromosomes. (a) Production of coffee beans. (b) Leaf rust incidence. (c) Yield of green beans.
Plants 14 03675 g003
Figure 4. Manhattan plots showing variable importance from Bootstrap Forest models for agronomic traits in the Intermediate population. The x-axis indicates the chromosomal position of each SNP. The y-axis represents the variable importance score (“Portion”), which quantifies the relative contribution of each SNP to the model’s explanatory power. Higher values indicate greater importance. Points are colored in alternating red and blue to distinguish between adjacent chromosomes. (a) Production of coffee beans. (b) Leaf rust incidence. (c) Yield of green beans.
Figure 4. Manhattan plots showing variable importance from Bootstrap Forest models for agronomic traits in the Intermediate population. The x-axis indicates the chromosomal position of each SNP. The y-axis represents the variable importance score (“Portion”), which quantifies the relative contribution of each SNP to the model’s explanatory power. Higher values indicate greater importance. Points are colored in alternating red and blue to distinguish between adjacent chromosomes. (a) Production of coffee beans. (b) Leaf rust incidence. (c) Yield of green beans.
Plants 14 03675 g004
Figure 5. GO enrichment analysis of candidate genes in the C. canephora Premature population. The analysis was performed using genes associated with the top 100 predictive SNPs from the Bootstrap Forest models for different trait combinations. The plots illustrate the most significantly enriched biological pathways. (a) Enriched GO terms for genes associated with green bean yield, highlighting pathways related to the internal space of organelles. (b) Enriched terms for the combined traits of coffee bean production and green bean yield, which include the addition of a ‘cellular macromolecule biosynthetic process’. (c) Enriched terms for the combination of all three traits (coffee bean production, green bean yield, and leaf rust incidence), identifying ‘lipid modification’ as another key pathway. In all plots, the x-axis represents the fold enrichment of the term, the size of each point corresponds to the number of genes associated with the term, and the color indicates the statistical significance based on the −log10(FDR). Abbreviations used: reg., regulation; proc., process.
Figure 5. GO enrichment analysis of candidate genes in the C. canephora Premature population. The analysis was performed using genes associated with the top 100 predictive SNPs from the Bootstrap Forest models for different trait combinations. The plots illustrate the most significantly enriched biological pathways. (a) Enriched GO terms for genes associated with green bean yield, highlighting pathways related to the internal space of organelles. (b) Enriched terms for the combined traits of coffee bean production and green bean yield, which include the addition of a ‘cellular macromolecule biosynthetic process’. (c) Enriched terms for the combination of all three traits (coffee bean production, green bean yield, and leaf rust incidence), identifying ‘lipid modification’ as another key pathway. In all plots, the x-axis represents the fold enrichment of the term, the size of each point corresponds to the number of genes associated with the term, and the color indicates the statistical significance based on the −log10(FDR). Abbreviations used: reg., regulation; proc., process.
Plants 14 03675 g005
Figure 6. GO enrichment analysis of candidate genes in the C. canephora Intermediate population. The analysis was performed on genes associated with the top 100 predictive SNPs from the Bootstrap Forest models for various trait combinations. (a) For the green bean yield trait, the analysis reveals a strong enrichment for pathways involved in the regulation of the actin cytoskeleton. (b) Combining coffee bean production and green bean yield traits retains the actin-related terms and introduces significant enrichment for signaling pathways, including ‘response to salicylic acid’. (c) The analysis of all three traits combined (coffee bean production, green bean yield, and leaf rust incidence) reinforces the importance of defense-related signaling (‘salicylic acid mediated signaling pathway’, ‘regulation of defense response’) alongside the persistently significant actin regulation pathways. In all plots, the x-axis represents the fold enrichment, the point size corresponds to the number of genes, and the color indicates statistical significance based on the −log10(FDR). Abbreviations used: reg., regulation; proc., process.
Figure 6. GO enrichment analysis of candidate genes in the C. canephora Intermediate population. The analysis was performed on genes associated with the top 100 predictive SNPs from the Bootstrap Forest models for various trait combinations. (a) For the green bean yield trait, the analysis reveals a strong enrichment for pathways involved in the regulation of the actin cytoskeleton. (b) Combining coffee bean production and green bean yield traits retains the actin-related terms and introduces significant enrichment for signaling pathways, including ‘response to salicylic acid’. (c) The analysis of all three traits combined (coffee bean production, green bean yield, and leaf rust incidence) reinforces the importance of defense-related signaling (‘salicylic acid mediated signaling pathway’, ‘regulation of defense response’) alongside the persistently significant actin regulation pathways. In all plots, the x-axis represents the fold enrichment, the point size corresponds to the number of genes, and the color indicates statistical significance based on the −log10(FDR). Abbreviations used: reg., regulation; proc., process.
Plants 14 03675 g006
Figure 7. GO enrichment analysis of candidate genes from the combined Premature and Intermediate populations. The analysis was performed on pooled gene lists associated with up to the top 100 predictive SNPs from the Bootstrap Forest models in both populations. (a) The combined analysis for leaf rust resistance reveals a strong enrichment for defense-related pathways, including ‘chitinase activity’ and ‘immune response’. (b) For the combined green bean yield trait, the analysis is characterized by a strong enrichment for pathways related to ‘actin cytoskeleton regulation’. (c) The addition of coffee bean production traits retains the dominant actin regulation theme while also introducing pathways related to ‘salicylic acid-mediated signaling’. (d) When all three traits (yield, green bean, and rust) are combined, the enrichment profile highlights pathways involved in ‘phosphatidylinositol metabolism’ and ‘sulfate transmembrane transport’. In all plots, the x-axis represents the fold enrichment, the point size corresponds to the number of genes, and the color indicates statistical significance based on the −log10(FDR). Abbreviations used: reg., regulation; proc., process.
Figure 7. GO enrichment analysis of candidate genes from the combined Premature and Intermediate populations. The analysis was performed on pooled gene lists associated with up to the top 100 predictive SNPs from the Bootstrap Forest models in both populations. (a) The combined analysis for leaf rust resistance reveals a strong enrichment for defense-related pathways, including ‘chitinase activity’ and ‘immune response’. (b) For the combined green bean yield trait, the analysis is characterized by a strong enrichment for pathways related to ‘actin cytoskeleton regulation’. (c) The addition of coffee bean production traits retains the dominant actin regulation theme while also introducing pathways related to ‘salicylic acid-mediated signaling’. (d) When all three traits (yield, green bean, and rust) are combined, the enrichment profile highlights pathways involved in ‘phosphatidylinositol metabolism’ and ‘sulfate transmembrane transport’. In all plots, the x-axis represents the fold enrichment, the point size corresponds to the number of genes, and the color indicates statistical significance based on the −log10(FDR). Abbreviations used: reg., regulation; proc., process.
Plants 14 03675 g007
Figure 8. A conceptual model summarizing the distinct biological strategies for agronomic performance in the Premature and Intermediate Coffea canephora populations. The model visually contrasts the key findings from the comparative genomic analysis. The genetic architecture of the Premature population is linked to specialized metabolic pathways, including “lipid modification” and the cellular component “organelle lumen,” and is highlighted by a key candidate gene, a putative caffeine synthase 3. In contrast, the Intermediate population’s traits are governed by variation in core cellular machinery & signaling, with significant enrichment for pathways like “actin cytoskeleton regulation” and “salicylic acid signaling.” This figure illustrates that the two populations achieve agronomic success through fundamentally different, population-specific biological routes.
Figure 8. A conceptual model summarizing the distinct biological strategies for agronomic performance in the Premature and Intermediate Coffea canephora populations. The model visually contrasts the key findings from the comparative genomic analysis. The genetic architecture of the Premature population is linked to specialized metabolic pathways, including “lipid modification” and the cellular component “organelle lumen,” and is highlighted by a key candidate gene, a putative caffeine synthase 3. In contrast, the Intermediate population’s traits are governed by variation in core cellular machinery & signaling, with significant enrichment for pathways like “actin cytoskeleton regulation” and “salicylic acid signaling.” This figure illustrates that the two populations achieve agronomic success through fundamentally different, population-specific biological routes.
Plants 14 03675 g008
Table 1. Candidate genes associated with SNPs showing significant associations with agronomic traits in two C. canephora populations (Premature and Intermediate).
Table 1. Candidate genes associated with SNPs showing significant associations with agronomic traits in two C. canephora populations (Premature and Intermediate).
SNP IDNearest Gene and FunctionBase Pairs AwayFDR p-ValueEffect SizeR-Square
Premature, production of coffee beans—Response Screening
6.2295851Cc06t02920.1
ADF-H domain-containing protein
02.12 × 10−100.490.35
6.2378930Cc06t03050.1
IPPc domain-containing protein
01.27 × 10−100.490.36
6.2075614Cc06t02620.1
E3 ubiquitin-protein ligase
00.000000000490.480.34
6.4097514Cc06t05200.1
NPL domain-containing protein
−290.00000000250.460.32
6.4477759Cc06t05570.1
IU_nuc
00.00000000260.470.32
4.15969587Cc04t13140.1
UDP-glycosyltransferase 83A1
−25,8320.000000270.410.25
10.6890851Cc10t07860.1
GH10 domain-containing protein
00.00000150.390.22
5.13748592Cc05t03040.1
Putative Short-chain dehydrogenase reductase ATA1
−47,4350.00000150.390.22
2.17352904
2.17352905
Cc02t19180.1
Stress enhanced protein 1, chloroplastic
00.00000260.380.21
2.18145802Cc02t20300.1
SCP domain-containing protein
+30080.00000270.380.21
Premature, leaf rust incidence—Response Screening
7.15362801Cc07t18070.1
Hexosyltransferase
−13040.0000000210.540.29
11.32751026Cc11t16690.1
Urease
00.0000000380.530.28
4.20658374Cc04t14270.1
Putative disease resistance RPP13-like protein 3
−32,1770.000000230.50.25
3.12306478Cc03t09860.1
NB-ARCdomain-containing protein
00.000000390.490.24
7.13312636Cc07t16380.1
Conserved hypothetical protein
+15620.000000580.490.23
2.35607328Cc02t30380.1
Peroxidase
00.00000140.590.34
5.13342765Cc05t02930.1
TAF domain-containing protein
+27,1600.00000140.590.34
5.14130887Cc05t03220.1
Lycopene beta/epsilon cyclase protein
−32410.00000140.590.34
5.14494464
5.14494471
5.14494484
Cc05t03270.1
AT1G05060.1
00.00000140.580.34
5.14625364Cc05t03340.1
Chitin elicitor receptor kinase 1
00.00000140.580.34
Premature, yield of green beans—Response Screening
11.12510898Cc11t03410.1
Protein of unknown function (DUF789)
+10,7260.0000000160.520.29
2.24864284Cc02t27160.1
Vicianin hydrolase
00.0000000180.520.29
9.6114101Cc09t05750.1
T-complex protein 1 subunit gamma
00.00000000520.540.31
11.7875427Cc11t02410.1
RING-type domain-containing protein
00.0000000630.50.27
1.3497181Cc01t01920.1
SKP1-like protein 4
+57430.000000230.480.25
11.30714537Cc11t13980.1
HDAC_interact domain-containing protein
00.0000000210.520.29
9.8293367Cc09t06990.1
Putative caffeine synthase 3
00.0000000380.510.28
11.30518300Cc11t13720.1
GSDH domain-containing protein
00.0000000670.50.27
11.30697733Cc11t13960.1
TORTIFOLIA1-like protein 4
00.0000000710.50.27
5.26247026Cc05t12380.1
Transducin/WD40 repeat-like superfamily protein
−30880.000000060.50.27
Intermediate, leaf rust incidence—Response Screening
2.22416916Cc02t25100.1
Nitrate regulatory gene 2 protein
00.000000110.560.21
10.3678570Cc10t04730.1
C2H2-type domain-containing protein
00.000000260.540.2
10.3747377Cc10t04810.1
WRKY domain-containing protein
00.00000120.520.18
1.26450374
1.26450396
Cc01t08110.1
Putative late blight resistance protein homolog R1B-16
00.00000160.510.18
1.29976260
1.29976261
1.29976262
Cc01t11280.1
Conserved hypothetical protein
−27410.00000130.510.18
5.28610373Cc05t15840.1
TPR_REGION domain-containing protein
00.000000110.560.21
7.3420157Cc07t04860.1
AAA domain-containing protein
00.00000270.500.17
11.13698759
11.13698762
Cc11t03510.1
RING-type domain-containing protein
−337,6000.00000350.500.17
4.27729192Cc04t17060.1
BHLH domain-containing protein
−850.00000320.500.17
10.3876532Cc10t04930.1
SASA domain-containing protein
−26410.00000480.490.16
SNPs were identified as significant based on a response screening analysis in JMP Pro 17, with an FDR-adjusted p-value threshold of 0.01. For each significant SNP, the table lists: the SNP identifier (SNP ID) (blue = negative and red = positive association); the nearest gene and its putative function (based on the annotation of the C. canephora reference genome); the distance in base pairs between the SNP and the start of the nearest gene (0 indicates the SNP is within the gene and negative and positive distances denote SNPs located upstream and downstream of the gene’s transcription start site (TSS), respectively); the FDR-adjusted p-value; the estimated effect size (slope from a linear regression of the phenotype on the SNP genotype); and the R-squared value (proportion of phenotypic variance explained by the SNP). Genes are listed separately for each population and trait combination. Only traits with at least one significant SNP are included.
Table 2. Top five candidate genes identified by Bootstrap Forest models for three agronomic traits in the Premature population.
Table 2. Top five candidate genes identified by Bootstrap Forest models for three agronomic traits in the Premature population.
SNP IDNearest Gene and FunctionBase Pairs AwayImportance Score (Portion)
Premature, production of coffee beans—Bootstrap Forest
6.4939167Cc06t06270.1
Alpha/beta-Hydrolases superfamily protein
00.037
6.2378930Cc06t03050.1
IPPc domain-containing protein
+1720.026
1.2020393Cc01t01290.1
Putative 60S ribosomal protein L23a-1
+66050.019
7.15086633Cc07t17840.1
NB-ARC domain-containing protein
+190.015
6.1079450Cc06t01300.1
HEAT repeat-containing protein
00.013
Premature, leaf rust incidence—Bootstrap Forest
7.12204503Cc07t15410.1
Acyl-coenzyme A oxidase
+3080.2
8.21181446Cc08t07800.1
Hydroxyproline-rich glycoprotein family protein
00.07
5.13342765Cc05t02930.1
TAF domain-containing protein
+27,1600.06
1.32477500Cc01t14390.1
Increased DNA methylation like
00.06
10.1733570Cc10t02280.1
OBG-type G domain-containing protein
00.056
Premature, yield of green beans- Bootstrap Forest
11.30697733Cc11t13960.1
TORTIFOLIA1-like protein 4
00.14
9.3350785Cc09t03950.1
NAD(P)-binding Rossmann-fold superfamily protein
00.1
5.24257788Cc05t09820.1
Glucose-1-phosphate adenylyltransferase
−17080.088
7.20811138Cc07t20130.1
Protein of unknown function (DUF1365)
+26,5610.073
10.20833232Cc10t11950.1
tRNA (guanine(37)-N1)-methyltransferase
00.061
SNPs were ranked based on their variable importance (“Portion”) in the models, with the corresponding genomic locations shown in Figure 3. The table lists the SNP identifier (SNP ID); the nearest gene and its putative function; the distance in base pairs between the SNP and the gene’s start site (0 indicates the SNP is within the gene and negative and positive distances correspond to upstream and downstream positions relative to the TSS); and the importance score from the model.
Table 3. Top five candidate genes identified by Bootstrap Forest models for three agronomic traits in the Intermediate population.
Table 3. Top five candidate genes identified by Bootstrap Forest models for three agronomic traits in the Intermediate population.
SNP IDNearest Gene and FunctionBase Pairs AwayImportance Score (Portion)
Intermediate, production of coffee beans—Bootstrap Forest
4.9597530Cc04t10310.1
Non-specific phospholipase C6
00.022
7.20000085
7.20000113
Cc07t19900.1
Smr domain-containing protein
−7370.016
4.15068413Cc04t12970.1
Alpha-N-acetylglucosaminidase
+8040.012
2.26624120Cc02t28190.1
BHLH domain-containing protein
+10830.01
4.3864735Cc04t05180.1
Phytocyanin domain-containing protein
00.01
Intermediate, leaf rust incidence—Bootstrap Forest
5.28610373Cc05t15840.1
TPR_REGION domain-containing protein
00.022
2.22416916Cc02t25100.1
Nitrate regulatory gene2 protein
00.019
10.3678570Cc10t04730.1
C2H2-type domain-containing protein
00.018
8.30746352Cc08t16130.1
Nucleoside diphosphate kinase
00.017
10.3612160Cc10t04630.1
BHLH domain-containing protein
00.014
Intermediate, yield of green beans—Bootstrap Forest
4.234906Cc04t00320.1
Conserved hypothetical protein
00.019
10.1467983Cc10t01960.1
Regulator of chromosome condensation (RCC1) family protein
00.012
11.1400701Cc11t00520.1
Exopolygalacturonase
+22970.01
11.30547303Cc11t13750.1
Galectin domain-containing protein
−63400.01
11.20731087Cc11t05390.1
NB-ARC domain-containing protein
00.0092
SNPs were ranked based on their variable importance (“Portion”) in the models, with the corresponding genomic locations shown in Figure 4. The table lists the SNP identifier (SNP ID); the nearest gene and its putative function; the distance in base pairs between the SNP and the gene’s start site (0 indicates the SNP is within the gene and negative and positive distances correspond to upstream and downstream positions relative to the TSS); and the importance score from the model.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ahn, E.; Park, S.; Bhatt, J.; Lim, S.; Meinhardt, L.W. Lipid Metabolism and Actin Cytoskeleton Regulation Underlie Yield and Disease Resistance in Two Coffea canephora Breeding Populations. Plants 2025, 14, 3675. https://doi.org/10.3390/plants14233675

AMA Style

Ahn E, Park S, Bhatt J, Lim S, Meinhardt LW. Lipid Metabolism and Actin Cytoskeleton Regulation Underlie Yield and Disease Resistance in Two Coffea canephora Breeding Populations. Plants. 2025; 14(23):3675. https://doi.org/10.3390/plants14233675

Chicago/Turabian Style

Ahn, Ezekiel, Sunchung Park, Jishnu Bhatt, Seunghyun Lim, and Lyndel W. Meinhardt. 2025. "Lipid Metabolism and Actin Cytoskeleton Regulation Underlie Yield and Disease Resistance in Two Coffea canephora Breeding Populations" Plants 14, no. 23: 3675. https://doi.org/10.3390/plants14233675

APA Style

Ahn, E., Park, S., Bhatt, J., Lim, S., & Meinhardt, L. W. (2025). Lipid Metabolism and Actin Cytoskeleton Regulation Underlie Yield and Disease Resistance in Two Coffea canephora Breeding Populations. Plants, 14(23), 3675. https://doi.org/10.3390/plants14233675

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop