Uncovering Stable Genetic Loci for Sustainable Pea (Pisum sativum L.) Production Through Genome-Wide Association Mapping

Zatybekov, Alibek; Ten, Evgeniy; Oshergina, Irina; Radul, Sergey; Amalova, Akerke; Abugalieva, Saule; Turuspekov, Yerlan

doi:10.3390/agronomy16090934

Open AccessArticle

Uncovering Stable Genetic Loci for Sustainable Pea (Pisum sativum L.) Production Through Genome-Wide Association Mapping

by

Alibek Zatybekov

¹

,

Evgeniy Ten

²

,

Irina Oshergina

²

,

Sergey Radul

³,

Akerke Amalova

¹

,

Saule Abugalieva

^1,4

and

Yerlan Turuspekov

^1,4,*

¹

Laboratory of Molecular Genetics, Institute of Plant Biology and Biotechnology, Almaty 050040, Kazakhstan

²

A. I. Barayev Research and Production Center for Grain Farming, Shortandy 021600, Akmola Region, Kazakhstan

³

Karabalyk Agricultural Experimental Station, Nauchnyi 110000, Kostanay Region, Kazakhstan

⁴

Faculty of Biology and Biotechnology, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan

^*

Author to whom correspondence should be addressed.

Agronomy 2026, 16(9), 934; https://doi.org/10.3390/agronomy16090934

Submission received: 2 April 2026 / Revised: 23 April 2026 / Accepted: 25 April 2026 / Published: 4 May 2026

(This article belongs to the Special Issue Crop Genomics and Omics for Future Food Security)

Download

Browse Figures

Versions Notes

Abstract

A comprehensive evaluation of phenotypic diversity, genetic structure, and marker–trait associations was conducted in a pea (Pisum sativum L.) collection of 184 accessions, using multi-environment field trials and genome-wide SNP data. Agronomic traits were assessed using best linear unbiased estimates, and statistical analyses included correlation, analysis of variance, heritability estimation, population structure, linkage disequilibrium, and genome-wide association study of 10,289 SNP markers. Phenological traits showed low variability, with flowering and maturity averaging 36.08 and 79.19 days (coefficient of variation of 6.17% and 3.79%, respectively), whereas yield-related traits varied more widely, with the number of pods per plant showing a coefficient of variation of 26.14%. Strong correlations were observed between plant height and height of the lowest pod attachment (r = 0.89, p < 0.001), while moderate positive correlations were found between flowering and maturity time (r = 0.43, p < 0.001) and between number of pods per plant and plant height (r = 0.44, p < 0.001); meanwhile, thousand seed weight exhibited significant negative correlation number of pods per plant (r = −0.42, p < 0.001). Heritability was highest for plant height (H² = 0.925), height of the lowest pod attachment (H² = 0.889), and thousand seed weight (H² = 0.883), while yield showed lower heritability (H² = 0.672) and strong environmental influence. Linkage disequilibrium decay was 1.78 Mb at r² = 0.2. GWAS identified 163 quantitative trait loci, including 19 stable loci, with strong effects such as −19.27 cm for q.PH.5-1 and +24.62 g for q.TSW.4-2. Candidate genes associated with key biological processes were identified, thereby enhancing understanding of the genetic control of traits.

Keywords:

world collection; phenotypic diversity; population structure; GWAS; marker-assisted selection

1. Introduction

Pea (Pisum sativum L., 2n = 14) is one of the most important cool-season grain legumes cultivated worldwide, providing a valuable source of plant-based protein, carbohydrates, vitamins, and minerals for both human consumption and animal feed. According to the Food and Agriculture Organization (FAO), dry peas are grown on more than 10–12 million hectares globally, reflecting their agronomic adaptability and economic importance [1]. Pea seeds typically contain 20–25% protein and are particularly rich in lysine, complementing cereal-based diets that are deficient in essential amino acids [2,3]. In addition to their nutritional value, peas contribute to sustainable agriculture through biological nitrogen fixation in symbiosis with Rhizobium bacteria, reducing the need for synthetic fertilizers and improving soil fertility [4].

In Kazakhstan, grain legumes, including pea, chickpea (Cicer arietinum L.), lentil (Lens culinaris Medik.), and soybean (Glycine max L.), are gaining increasing importance as diversification crops within cereal-based systems. The northern regions of Kazakhstan, characterized by a continental climate with frequent droughts and temperature fluctuations, are particularly suitable for field pea cultivation [1,5]. Recent national statistics indicate a steady increase in the area under grain legumes, driven by both domestic demand and export opportunities, especially to Asian markets (Bureau of National Statistics of Kazakhstan). However, productivity remains variable due to abiotic stresses, suboptimal agronomic practices, and limited availability of locally adapted high-yielding cultivars [6,7,8,9,10]. Therefore, improving key agronomic traits associated with yield stability and adaptability is a priority for legume breeding programs in Kazakhstan and similar agroecological zones.

Grain yield in pea is a complex quantitative trait determined by several interrelated components, including flowering time, maturity time, plant height, height of the lowest pod attachment, number of pods per plant, number of seeds per pod, and thousand seed weight [11,12]. Flowering time and maturity time are critical phenological traits that determine pea genotypes’ adaptation to specific environments. Early flowering and maturity can help plants escape terminal drought and heat stress, which are common in continental climates such as those of Central Asia [13]. Conversely, longer growth duration may allow for greater biomass accumulation and potentially higher yield under favorable conditions [14]. The genetic control of flowering time in pea involves multiple loci responsive to photoperiod and temperature, making it a key target for breeding programs aiming to optimize crop adaptation across latitudes [15].

Plant height and the height of the lowest pod attachment are important morphological traits influencing both yield potential and harvest efficiency. Taller plants often exhibit greater biomass and higher yield potential; however, excessive height may increase susceptibility to lodging, particularly under high-input conditions [16]. The height of the lowest pod is especially critical for mechanized harvesting, as low pod placement can result in significant yield losses due to incomplete harvesting [17]. Breeding efforts have therefore focused on optimizing plant architecture, including semi-leafless types that improve standability and light interception [18].

Yield components such as the number of pods per plant, the number of seeds per pod, and thousand-seed weight directly contribute to final grain yield and are commonly used as selection criteria in breeding programs. The number of pods per plant is often the most variable component and is strongly influenced by environmental conditions and genotype × environment (G × E) interactions [12]. The number of seeds per pod is generally more stable but can be affected by stress during the flowering and pod-set stages [19]. Thousand-seed weight reflects seed size and is an important quality trait, particularly for market classes with specific size requirements. However, trade-offs among yield components are common; for example, an increase in seed number may be accompanied by a decrease in seed size, complicating selection strategies [20]. Therefore, understanding the genetic architecture underlying these traits is essential for achieving simultaneous improvement.

Advances in molecular genetics and genomics have significantly enhanced the ability to dissect complex traits in pea. The availability of a high-quality reference genome [21] and the development of high-throughput genotyping platforms, such as genotyping-by-sequencing (GBS) and SNP arrays, have enabled genome-wide analysis of genetic variation [22,23]. SNP markers, due to their abundance and genome-wide distribution, provide high-resolution insights into genetic diversity and linkage disequilibrium patterns [24]. These tools are particularly valuable for crops like pea, which historically lagged behind major cereals in genomic resource development.

Genome-wide association studies (GWASs) have emerged as a powerful approach for identifying marker–trait associations by exploiting natural genetic variation in diverse germplasm collections [25]. Compared with traditional biparental quantitative trait locus (QTL) mapping, GWASs offer higher mapping resolution and the ability to analyze multiple alleles simultaneously. In pea, GWAS has been successfully applied to identify loci associated with yield components, phenology, disease resistance, and seed quality traits [12,26,27,28]. However, relatively few studies have examined the genetic basis of yield-related traits under the specific environmental conditions of Central Asia, including Kazakhstan, where drought and heat stress are major limiting factors.

The utilization of diverse germplasm is essential for broadening the genetic base of cultivated pea and enhancing breeding progress. Germplasm collections, including landraces and breeding lines, harbor valuable alleles for adaptation and productivity traits [29]. In Kazakhstan and neighboring regions, local germplasm may possess unique adaptations to continental climates, such as tolerance to early-season cold and terminal drought. Characterizing this diversity using both phenotypic and genotypic data is crucial for identifying promising parental lines and developing improved cultivars.

In this context, multi-environment phenotyping combined with high-density genotyping provides a robust framework for understanding the genetic control of complex traits. The integration of phenotypic data across years and locations enables estimation of stable genetic effects while accounting for environmental variability [30]. Such approaches are particularly relevant for traits like yield per square meter, which integrates multiple component traits and reflects overall plant performance under field conditions.

The objectives of the present study were to evaluate phenotypic variation and genotype × environment interactions for key agronomic traits, assess the genetic diversity and population structure of a pea association panel, and identify SNP markers associated with these traits using a genome-wide association study. The results of this study will contribute to a better understanding of the genetic architecture of yield-related traits in pea and support the development of improved cultivars adapted to the agroecological conditions of Kazakhstan and similar environments.

2. Materials and Methods

2.1. Plant Material and Field Experiments

A panel of 184 P. sativum accessions from six geographical regions across 22 countries was used in this study (Table S1). Field evaluations were conducted across two locations during the 2024–2025 growing seasons, totaling four environments: the Karabalyk Agricultural Experimental Station (KAES, 53.855124, 62.120378) and the A. I. Barayev Research and Production Center for Grain Farming (RPCGF, 51.702556, 70.941558). The experiments were arranged in a randomized complete block design (RCBD) with two replicates per location. Climatic conditions at both locations during the study period are presented in Figure S1.

Eight agronomic traits were evaluated: flowering time (VER2, days), maturity time (VER8, days), plant height (PH, cm), height of the lowest pod attachment (HLAP, cm), number of pods per plant (NPP, count.), number of seeds per pod (NSPP, count.), thousand seed weight (TSW, g), and yield per square meter (Ypm², g). Phenological stages were recorded according to standardized growth scales for grain legumes based on the BBCH scale adapted for pea. The flowering time was defined as the interval from seedling emergence until more than 75% of the buds had opened. Similarly, maturity time was calculated from emergence until over 75% of the pods reached physiological maturity. Morphological and yield-related traits were measured on representative plants within each plot, and plot means were used for subsequent analyses [31].

2.2. Statistical Analysis

Best Linear Unbiased Estimates (BLUEs) were calculated across environments using linear mixed models implemented in R (version 4.3.2), using the lme4 (v1.1-34) and lmerTest (v3.1-3) packages [32]. Genotype was treated as a fixed effect, while location, year, and replication were considered random effects.

Pearson’s correlation coefficients among traits were calculated based on BLUE values using the Hmisc (v5.1-1) and corrplot (v0.92) packages in R.

Two-way ANOVA was performed using the stats package in R to partition variance into genotype (G), environment (E; defined as location × year), and genotype × environment interaction (G × E) effects. Statistical significance was assessed using F-tests [33]. Broad-sense heritability (H²) was estimated based on variance components derived from the model [34].

2.3. Genotyping and SNP Dataset

Genotyping was performed using the GenoPea 13.2K Array [35], a high-density SNP genotyping platform developed specifically for P. sativum, providing genome-wide marker coverage. Quality control of SNP data was conducted using TASSEL (version 5.2.90), including filtering for missing data (>10%) and minor allele frequency (MAF < 0.05). After filtering, 10,289 high-quality SNP markers were retained for downstream analyses.

2.4. Population Structure and Genetic Diversity

Principal component analysis (PCA) was conducted using the SNPRelate (v1.34.0) package in R to assess genetic structure among accessions [36]. A neighbor-joining phylogenetic tree was constructed using the ape package (v5.7-1) in R [37]. The resulting tree was visualized and annotated using iTOL (version 6.8) [38].

Population structure was inferred using STRUCTURE (version 2.3.4) with an admixture model and correlated allele frequencies. The number of clusters (K) was tested from 1 to 10 with 10 independent runs per K, using a burn-in period of 100,000 iterations followed by 100,000 MCMC repetitions. The optimal K value was determined using the ΔK method [39]. The kinship matrix was calculated using the centered identity-by-state (IBS) method implemented in TASSEL (v5.2.90) [40] and visualized using the pheatmap (v1.0.12) package in R.

2.5. Association Mapping

Linkage disequilibrium (LD) decay was estimated using TASSEL (v5.2.90) by calculating the squared correlation coefficient (r²) between SNP pairs [40]. LD decay curves were generated using LOESS smoothing implemented in the ggplot2 (v3.4.4) package in R [41].

Genome-wide association analysis was conducted using a multi-locus mixed model (MLMM) implemented in GAPIT (version 3.0) in R. The MLMM approach iteratively incorporates associated markers as cofactors, improving detection power while controlling false positives. The model accounted for population structure (principal components) and kinship (K matrix) [42,43]. Significant associations were identified using a threshold of p < 1 × 10⁻⁵, and stable QTLs were defined across environments. Manhattan and quantile–quantile (QQ) plots were visualized with the rMVP package [44].

Candidate genes within significant QTL regions were identified based on physical positions based on the LD decay distance (1.78 Mb, r² = 0.2) using the reference genome assembly (Cameor v1a). Functional annotation was performed using EnsemblPlants (http://plants.ensembl.org/Pisum_sativum/Info/Index, accessed on 16 February 2026), Pulse Crop Database (https://www.pulsedb.org/organism/639, accessed on 16 February 2026), and Quick GO databases (https://www.ebi.ac.uk/QuickGO/, accessed on 16 February 2026). Genetic map visualization was performed using MapChart v. 2.32 [45]. Additional graphical outputs were generated using SRplot [46].

3. Results

3.1. Phenotypic Variability, Correlation, and Heritability of Agronomic Traits

The agronomic performance and variability of the pea collection studied (Table S2), as indicated by BLUE values, are summarized in Table 1. Phenological traits exhibited the lowest level of variation among the evaluated accessions. VER2 averaged 36.08 days, ranging from 31.67 to 41.00 days (CV = 6.17%), while the VER8 showed even higher stability (CV = 3.79%), with a mean of 79.19 days and a range of 72.28 to 85.72 days. Observations showed that the earliest-ripening samples were the cultivars Kleine Keinlengerin (Finland) and L-22090 (Russia), both of which reached VER8 in 72 days.

In contrast, traits related to plant architecture and productivity demonstrated significant diversity. The PH ranged from 33.44 to 134.44 cm, with a mean of 72.02 cm and a CV of 24.54%. Clear outliers for PH included cultivar Sakharnyy rafinad (Pea-033 = 152 cm) and Karagandinskiy 1043 (Pea-056 = 130 cm), which were double the collection average in the 2025 KAES trial. Similarly, the HLAP showed high variability (CV = 25.48%), with values ranging from 18.96 to 75.01 cm.

Yield components also varied considerably across the collection. The NSPP averaged 5.49 (3.67–7.67), while the NPP exhibited the highest variation in the study (CV = 26.14%), ranging from 4.08 to 16.41. The TSW mean was 202.43 g, with a maximum value of 347.89 g (cultivar Telefon from Italy). Finally, the Ypm² averaged 228.34 g, with accessions ranging from 100.18 to 330.05 g (CV = 16.39%).

Correlation analysis revealed several highly significant relationships between the traits studied (Figure 1). A strong positive correlation was observed between PH and HLAP (r = 0.89), while moderate positive correlations were found between VER2 and VER8 (r = 0.43) and between NPP and PH (r = 0.44).

Regarding productivity, Ypm² showed positive correlations with the PH (r = 0.28), HLAP (r = 0.24), NPP (r = 0.19), and TSW (r = 0.23), but a negative correlation with the NSPP (r = −0.19). Notably, TSW exhibited significant correlation only with NPP (r = −0.42), suggesting a trade-off between seed size and seed quantity. No significant correlations were observed between VER8 and other traits, except for VER2.

Subsequent analyses were conducted for only six traits, as VER2 and VER8 were not observed at the KAES site during the 2024 season. ANOVA for these six evaluated agronomic traits reveals significant effects for all primary sources of variation: Genotype, Environment, and the Genotype × Environment interaction, as shown in Table 2, with the exception of NPP, for which the Genotype × Environment interaction was non-significant.

Plant architecture traits showed highly significant Genotype effects (p < 2 × 10⁻¹⁶), with F-values of 10.51 and 9.70, respectively. Environmental influence was also significant for both traits (PH: p = 3.29 × 10⁻¹¹; HLAP: p = 1.18 × 10⁻⁵). The Genotype × Environment interaction was significant for PH (p = 1.52 × 10⁻¹²) and HLAP (p < 2 × 10⁻¹⁶). Broad-sense heritability (H²) estimates were high, reaching 0.925 for PH and 0.889 for HLAP.

Yield-related components also exhibited significant Genotype effects. For NPP, NSPP, and TSW, significance levels were p < 2 × 10⁻¹⁶, whereas Ypm² showed p = 0.011. Environmental effects were significant across all traits, with particularly high F-values for NSPP (615.37) and NPP (56.36). The Genotype × Environment interaction was significant for NSPP (p < 2 × 10⁻¹⁶), TSW (p = 1.17 × 10⁻¹²), and Ypm² (p = 0.001), but not for NPP (p = 0.0878). H² estimates ranged from 0.672 (Ypm²) to 0.883 (TSW).

3.2. Population Analysis and Genetic Relationship

The genetic structure of the pea collection studied, analyzed using 10,289 SNP markers, is presented in Figure 2. The analysis provides a comprehensive view of genetic diversity in relation to both geographic origin and morphological differentiation.

The PCA plot (Figure 2A) was used to visualize the genetic relationships among accessions by geographical origin group. The first two principal components, PC1 and PC2, accounted for 12.03% and 4.44% of the total genetic variance, respectively. Overall, the plot did not show distinct cluster separation corresponding strictly to geographical origin groups. However, a slight concentration of accessions in the Eastern Europe group was observed, whereas other groups, such as Western Europe, North America, and Kazakhstan, appeared more broadly distributed across the plot. This distribution pattern suggests a high level of genetic admixture or significant gene flow between accessions from different geographical origin groups within the collection.

Phylogenetic analysis by geographical origin group (Figure 2B) resulted in a complex radial cladogram. Genetic clustering did not align strictly with the geographic origin groups, as individuals from different locations were frequently interspersed throughout the tree’s branches rather than forming exclusive clades. However, some groups did show limited clustering tendencies. For instance, small clades composed predominantly of Western Europe or Kazakhstan accessions were discernible on certain peripheral branches. Additionally, a distinct central cluster contained multiple clades formed exclusively by accessions from Eastern Europe. This indicates that while there is some genetic relatedness tied to geographic origin, particularly for Eastern Europe, the overall genetic structure is largely influenced by broader patterns of genetic exchange across regions.

Figure 2C presents the same phylogenetic structure but overlaid with leaf differentiation data. The clustering results did not reveal a clear separation of accessions into distinct groups based on leaf type.

The optimal number of genetic clusters was determined using the Evanno method, which calculates the rate of change in the log-probability of data between successive K values (Delta K) (Figure 3). The Delta K plot reveals a prominent peak at K = 2 (Figure 3A). This statistical inflection point indicates that the population is most effectively partitioned into two primary ancestral groups. This level of clustering captures the most significant division in the collection’s genetic background before finer substructuring occurs at higher K values. The bar plot (Figure 3B) visualizes the individual assignment of each accession to the two inferred ancestral clusters. The distribution shows a clear division: a subset of the collection is almost entirely assigned to one of the two ancestral lineages, while a significant number of accessions exhibit varying levels of admixture.

The kinship heatmap (Figure 3C) illustrates the relative genetic relatedness between all pairs of accessions. The matrix, ordered by hierarchical clustering, reveals a distinct block-like structure along the diagonal. These blocks represent groups of accessions with high co-ancestry coefficients. While several small clusters of closely related genotypes, likely representing sister lines or cultivars derived from common breeding programs, are evident, the prevalence of darker blue regions across the off-diagonal regions indicates that a substantial portion of the collection comprises distantly related individuals, thereby maintaining high overall genetic diversity.

3.3. Identification of Stable QTLs and Their Functional Candidate Genes

The distribution of 10,289 SNP markers across the seven chromosomes of P. sativum is illustrated in Figure 4. The markers cover a total genomic length of approximately 4.4 Gb. Chromosome 5 exhibited the highest marker density, whereas chromosomes 6 and 1 were characterized by a relatively lower number of polymorphic sites. Across all chromosomes, marker distribution is non-uniform: higher densities are predominantly observed in the telomeric regions, whereas the centromeric and pericentromeric regions show a marked decrease in SNP frequency, forming “gaps” in the density map. The average marker density across the genome is 2.34 markers per 1 Mb (Figure 4A).

The rate of LD decay was estimated by analyzing the squared correlation coefficient (r²) as a function of the physical distance between marker pairs (Figure 4B). The initial r² value at short physical distances was approximately 0.45. A rapid decline in LD is observed within the first 1000 kb. Based on the intersection of the LOESS regression curve with the critical threshold of r² = 0.2, the average LD decay distance across the entire genome is 1.78 Mb. Beyond this distance, r² values stabilize at a background level below 0.1.

In total, the GWAS analysis identified 163 QTLs across the collection (Table S3). Of these, 19 QTLs were found to be significantly associated in two or more trials (Table 3). We define these 19 QTLs as “stable,” meaning genomic regions that consistently influence a given agronomic trait across multiple environments, years, or both, as reflected in the distinct trials in this study.

For morphology traits, a highly significant QTL for the PH (q.PH.5-1) was identified on Chromosome 5 at 639.9 Mb. This locus, associated with the SNP PsCam037922_22979_691, exhibited a substantial negative effect of −19.27. Notably, the same SNP was also associated with the HLAP at the q.HLAP.5-2 locus, where it also showed a negative effect (−6.44). In total, six stable QTLs were mapped for HLAP, distributed across Chromosomes 2, 4, 5, and 6, with effect sizes ranging from −6.44 to 4.19.

Regarding seed productivity, two stable QTLs were identified for the NPP on Chromosomes 3 and 5, with q.NPP.3-1 (p = 7.73 × 10⁻⁷) showing a positive effect of 1.88. For the NSPP, three QTLs were localized on Chromosomes 1, 2, and 5, and the most significant association was q.NSPP.1-1 (p = 5.11 × 10⁻⁷).

The TSW was associated with the highest number of stable loci (six), distributed across Chromosomes 1, 2, 3, and 4. The locus q.TSW.4-2 on Chromosome 4 exhibited the largest positive effect (24.62) with a highly significant p-value (2.38 × 10⁻⁷). Finally, a stable QTL for Ypm² (q.Ypm2.7-1) was identified on Chromosome 7 at 60.6 Mb, demonstrating a positive effect of 20.56.

Candidate gene analysis within stable QTL regions identified several genetic factors regulating pea agronomic traits (Table 4). For morphology traits, Psat5g299720 (LE) was identified within both q.PH.5-1 and q.HLAP.5-2. This gene is involved in the gibberellin biosynthetic process, specifically encoding gibberellin 3-beta-dioxygenase and oxidoreductase activity. The HLAP.6-1 locus contains the PHYA gene, which functions as a photoreceptor in red- and far-red-light phototransduction. Additional genes for HLAP include Psat2g178840 (SNARE binding and exocytosis), Psat4g150760 (oxidoreductase activity), and Psat5g121040 (localized to the plasmodesma).

Regarding yield components, q.NPP.3-1 harbors Psat3g110240, associated with protein binding. For NSPP, candidate genes include Psat2g046240 (glutamine synthetase activity) and Psat5g306040 (regulation of DNA-templated transcription).

Traits associated with TSW include Psat3g137280, involved in lignin catabolism, as well as Psat4g094200 and Psat4g121160, which function in protein ubiquitination and Golgi vesicle coating, respectively. Finally, q.Ypm2.7-1 for Ypm² contains Psat7g031840, which is predicted to regulate receptor-mediated endocytosis and the activity of structural molecules.

The physical distribution of stable QTLs and known genes across the seven chromosomes of the pea genome is illustrated in Figure 5. Chromosome 1 primarily harbors yield-related loci such as q.TSW.1-1 and q.NSPP.1-1. Chromosome 5 represents a significant cluster of genetic control, containing a major pleiotropic region near the distal end associated with PH and HLAP (q.PH.5-1/q.HLAP.5-2), located in close proximity to the known Le gene. Additionally, Chromosome 5 contains multiple loci for yield components, including q.NPP.5-1 and q.NSPP.5-1.

Chromosome 2 and Chromosome 3 also show clusters of associations; Chromosome 2 features the SGR1 gene and the stable locus q.HLAP.2-1, while Chromosome 3 contains q.NPP.3-1 and q.TSW.3-1 near the known SBE1 gene. Chromosome 4 exhibits a high density of TSW-related loci (q.TSW.4-1, q.TSW.4-2). Finally, a stable yield locus, q.Ypm2.7-1, is positioned on the distal arm of Chromosome 7.

4. Discussion

4.1. Phenotypic Variability, Correlation, and Heritability of Agronomic Traits

The present study revealed low variability for phenological traits (CV = 3.79–6.17%) and substantially higher variation for morphological and yield components (CV up to 26.14%) (Table 1), consistent with previous findings in P. sativum. For instance, Carlson-Nilsson et al. [47] reported limited variation in flowering and maturity alongside broader diversity in productivity-related traits, reflecting stronger environmental regulation of phenology and greater genetic flexibility in yield components.

The application of BLUEs across the two experimental stations provides a reliable framework for evaluating genotypic performance while accounting for environmental variation under the existing testing conditions. The current design reflects the major pea evaluation sites available in Kazakhstan, which, although geographically close, constitute the key national infrastructure for pea field trials. Similar analytical approaches have been recommended by Kebebe et al. [48], particularly for multi-environment genetic evaluations, although many studies typically include broader environmental contrasts. Despite the limited geographic distance between the testing sites, significant genotype, environment, and G × E interaction effects were detected, which is consistent with the findings of Dehghani et al. [49], indicating that even moderate environmental differences can significantly influence trait expression in legumes. However, it should be noted that using only two replicates per environment may increase residual experimental error and slightly reduce the precision of BLUEs and heritability for some traits. This limitation is inherent to field-based trials and should be considered when interpreting results for complex traits such as yield. The observed correlations, particularly between VER2 and VER8 (r = 0.43) and between PH and HLAP (r = 0.89), are consistent with established developmental relationships [50] (Figure 1). The negative association between TSW and NPP supports previously reported trade-offs, indicating constraints on assimilate partitioning between seed size and pod number [51] (Figure 1).

There were high heritability estimates for morphological traits (PH: 0.925; HLAP: 0.889) and TSW (0.883), suggesting strong genetic control in agreement with Kosev et al. [52] (Table 2). Conversely, the lower heritability of yield (0.672) and its strong environmental dependence are consistent with findings by Rana et al. [53] (Table 2). While breeding for morphological traits and yield components can be achieved in a single environment, improving overall yield requires testing across multiple locations to account for persistent environmental sensitivity, even at proximate sites [54].

4.2. Population Analysis and Genetic Relationship

The population structure inferred from 10,289 SNP markers revealed weak clustering by geographic origin and a high level of admixture, which is consistent with previous studies in pea. For instance, Smýkal et al. [55] demonstrated that modern P. sativum germplasm often lacks clear geographic stratification due to extensive exchange and breeding. Similarly, Brhane et al. reported low variance explained by the first principal components (<15%), comparable to the 14.9% observed in this study, indicating complex genetic backgrounds and weak population structure [56].

The absence of strong phylogenetic clustering by origin contrasts with studies of more geographically isolated germplasm but aligns with findings by Burstin et al. [57] in which gene flow and historical selection reduced geographic signals. However, the partial clustering of Eastern European accessions in the present study suggests some regional conservation of allelic combinations, supporting observations from Siol et al. [58] that certain gene pools retain localized structure.

Notably, the stronger alignment between genetic distance and morphological differentiation indicates that SNP markers effectively capture functional variation. This agrees with the results of Gali et al. [12], in which marker-based clustering corresponded more closely to phenotypic traits than to origin. Such patterns suggest that selection for agronomic traits plays a more significant role than geographic separation in shaping genetic structure.

The identification of K = 2 as the optimal clustering level is also widely reported in pea diversity studies, including Pavan et al. [59], reflecting the presence of two major ancestral gene pools. Kinship analysis supports the observed population structure, revealing both closely related groups of accessions and a high level of overall genetic diversity, consistent with previous findings [60]. Overall, the results suggest that despite the relatively uniform testing environments, the collection maintains high genetic diversity with weak geographic structure, emphasizing the dominant role of breeding history and selection over geographic origin.

4.3. Identification of Stable QTLs and Their Functional Candidate Genes

The genome-wide SNP distribution and LD decay patterns observed in this study are consistent with previous genomic analyses in P. sativum. The non-uniform marker distribution, characterized by higher SNP density in telomeric regions and reduced polymorphism in centromeric and pericentromeric regions, reflects recombination landscapes described by Kreplak et al. [21]. Such patterns are typical for large plant genomes, where gene-rich distal chromosomal regions exhibit higher recombination rates and genetic diversity. The estimated LD decay of 1.78 Mb in this study is consistent with earlier reports of 1–5 Mb in pea (Gali et al. [12]). However, the relatively large LD decay suggests that the identified QTL intervals may still contain multiple candidate genes; therefore, fine-mapping and functional validation will be required to confirm causal variants. This relatively slow LD decay also reflects the self-pollinating nature of the pea and indicates that the marker density applied here is appropriate for genome-wide association studies, although it inherently limits mapping resolution compared to outcrossing species.

The identification of 163 QTLs, including 19 stable loci across multiple trials, is comparable with findings from other multi-environment GWASs. For example, Annicchiarico et al. [61] reported numerous environment-specific associations but emphasized that only a fraction remained consistent across environments. In the present study, stable QTLs were defined as those reproducible across trials, thereby strengthening their potential utility in breeding programs. Importantly, although the experimental sites were geographically close, significant genotype-by-environment interactions were still observed, and stable loci could be distinguished. This aligns with the conclusions of Annicchiarico et al. [62], who highlighted that even moderate environmental variation is sufficient to reveal differential genotype responses and to validate robust marker–trait associations. However, no stable QTLs associated with VER2 and VER8 were detected across trials, likely due to the very low phenotypic variability observed for these traits, as reflected by CV values of 6.17% for VER2 and 3.79% for VER8 (Table 1). The limited variation in VER2 likely reduced the statistical power to identify reproducible marker–trait associations across environments.

The co-localization of QTLs for PH and HLAP on Chromosome 5 suggests either pleiotropic effects or tight genetic linkage. This observation is consistent with previous genetic studies in pea, where major loci often influence multiple morphological traits. For instance, Weller et al. [15] demonstrated that key developmental genes can simultaneously regulate stem elongation, flowering, and plant architecture. The identification of Psat5g299720, which encodes gibberellin 3-beta-dioxygenase and most likely corresponds to the well-characterized LE (length) gene, is therefore highly consistent with known physiological mechanisms that control plant height [62].

Similarly, the detection of PHYA within the HLAP-associated locus on Chromosome 6 corresponds well with previous findings on the role of phytochrome genes in regulating plant development. Hecht et al. [63] showed that photoreceptor-mediated signaling pathways influence not only flowering time but also plant architecture through light-dependent growth responses. The presence of additional candidate genes involved in vesicle transport (Psat2g178840), oxidoreductase activity (Psat4g150760), and plasmodesmata function (Psat5g121040) further indicates that multiple cellular processes may contribute to variation in morphological traits.

For yield-related traits, the identification of multiple QTLs for TSW across Chromosomes 1, 2, 3, and 4 supports the polygenic nature of this trait [12]. The relatively large effect observed for q.TSW.4-2 suggests the presence of major-effect alleles, although most studies indicate that seed weight is typically controlled by a combination of loci with small to moderate effects [12]. The candidate genes identified within TSW-associated regions, including those involved in ubiquitination (Psat4g094200), vesicle trafficking (Psat4g121160), and lignin metabolism (Psat3g137280), are consistent with pathways implicated in seed development and biomass allocation. These functional annotations are supported by broader genomic analyses, such as those by Smýkal et al. [55], which highlighted the importance of metabolic and regulatory pathways in determining seed size and composition.

The identification of candidate genes associated with nitrogen metabolism (Psat2g046240) and transcriptional regulation (Psat5g306040) within NSPP-related QTLs also aligns with previous studies that link nutrient assimilation and transcriptional regulation to reproductive development. Likewise, the detection of Psat7g031840 within the yield-associated locus q.Ypm2.7-1, involved in vesicle-mediated transport and endocytosis, may support the role of intracellular transport processes in determining overall plant productivity [60]. Although several promising candidate genes were identified within stable QTL regions, their functional roles remain putative as no expression profiling or experimental validation was performed in this study. Therefore, these genes should be considered as candidate loci based on positional and functional annotation evidence.

The integration of LD analysis, multi-environment GWAS, and candidate gene annotation in this study yields results that are highly consistent with the existing literature while also refining key genomic regions responsible for morphological and yield-related traits under relatively similar environmental conditions.

5. Conclusions

This study demonstrated substantial phenotypic and genetic diversity within the evaluated P. sativum collection, supported by both field-based and genomic analyses. Phenological traits showed low variability, with flowering time averaging 36.08 days and maturity averaging 79.19 days, whereas morphological and yield-related traits exhibited considerably higher variation, reaching up to 26.14% for NPP. Plant height was strongly and positively correlated with the height of the lowest pod attachment. Additionally, moderate positive correlations were observed between flowering and maturity time and between the number of pods per plant and plant height. In contrast, thousand-seed weight exhibited a significant negative correlation with the number of pods per plant. ANOVA revealed highly significant effects (p < 0.001) of Genotype, Environment, and G × E interaction for all traits. High H² was recorded for plant height (0.925), height of the lowest pod attachment (0.889), and thousand-seed weight (0.883), while yield showed relatively lower heritability (0.672) and strong environmental influence. GWAS identified 163 QTLs, including 19 stable loci across environments, with key strong regions such as q.PH.5-1 (effect = −19.27 cm) and q.TSW.4-2 (effect = 24.62 g). The combination of moderate-to-high heritability for key agronomic traits and the identification of stable, reproducible QTLs across environments highlights the robustness of the detected genetic signals. These loci represent promising candidate regions for further investigation in marker-assisted selection to improve plant architecture and seed weight in pea. However, their direct application in breeding will require validation in independent populations and additional testing to confirm their predictive value. Overall, this study enhances our understanding of the genetic architecture of complex agronomic traits in pea and the role of genotype-by-environment interactions in shaping phenotypic variation.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agronomy16090934/s1, Figure S1: Climatic conditions at KAES and RPCGF; Table S1: The list of pea accessions in the studied collection; Table S2: Field assessment of the studied pea collection at KAES and RPCGF; Table S3: GWAS results for agronomic traits in the pea collection across two experimental stations.

Author Contributions

Conceptualization, A.Z. and Y.T.; Methodology, A.Z., E.T. and S.R.; Validation, A.Z., A.A. and S.A.; Formal Analysis, E.T., I.O. and S.R.; Investigation, A.Z., E.T., I.O. and S.R.; Resources, S.A., E.T., I.O. and S.R.; Data Curation, A.Z., E.T. and I.O.; Writing—Original Draft Preparation, A.Z.; Writing—Review and Editing, A.Z., E.T., I.O., S.R., A.A., S.A. and Y.T.; Supervision, Y.T.; Project Administration, Y.T. and S.A.; Funding Acquisition, Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by the Committee of Science of the Ministry of Science and Higher Education of the Republic of Kazakhstan (Program No BR24992903).

Data Availability Statement

The datasets generated and/or analyzed during the current study are available in the manuscript text and/or Supplementary Materials.

Acknowledgments

The authors acknowledge the staff of the Scientific and Production Center of Grain Farming named after A.I. Barayev (Akmola region, Kazakhstan) and the Karabalyk Agricultural Experimental Station (Kostanay region, Kazakhstan) for their assistance in the field assessment of the pea collection.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Food and Agriculture Organization (FAO). FAOSTAT Statistical Database. Available online: https://www.fao.org/faostat/ (accessed on 26 March 2026).
Burstin, J.; Gallardo, K.; Mir, R.R.; Varshney, R.K.; Duc, G. Improving protein content and nutrition quality. In Biology and Breeding of Food Legumes; Pratap, A., Kumar, J., Eds.; CAB International: Wallingford, UK, 2011; pp. 314–328. [Google Scholar]
Dahl, W.J.; Foster, L.M.; Tyler, R.T. Review of the health benefits of peas (Pisum sativum L.). Br. J. Nutr. 2012, 108, S3–S10. [Google Scholar] [CrossRef] [PubMed]
Peoples, M.B.; Herridge, D.F.; Ladha, J.K. Biological nitrogen fixation: An efficient source of nitrogen for sustainable agricultural production? Plant Soil 1995, 174, 3–28. [Google Scholar] [CrossRef]
Suleimenov, M. Trends in the agriculture of Central Asia and implications for rangelands and croplands. In Novel Measurement and Assessment Tools for Monitoring and Management of Land and Water Resources in Agricultural Landscapes of Central Asia; Springer International Publishing: Cham, Switzerland, 2013; pp. 91–105. [Google Scholar] [CrossRef]
Zatybekov, A.; Turuspekov, Y.; Doszhanova, B.; Didorenko, S.; Abugalieva, S. Effect of population size on genome-wide association study of agronomic traits in soybean. Proc. Latv. Acad. Sci. Sect. B Nat. Exact Appl. Sci. 2020, 74, 244–251. [Google Scholar] [CrossRef]
Zatybekov, A.; Abugalieva, S.; Didorenko, S.; Rsaliyev, A.; Maulenbay, A.; Fang, C.; Turuspekov, Y. Genome-wide association study for charcoal rot resistance in soybean harvested in Kazakhstan. Vavilov J. Genet. Breed. 2023, 27, 565. [Google Scholar] [CrossRef] [PubMed]
Doszhanova, B.N.; Zatybekov, A.K.; Didorenko, S.V.; Suzuki, T.; Yamashita, Y.; Turuspekov, Y. Identification of quantitative trait loci of pod dehiscence in a collection of soybean grown in the southeast of Kazakhstan. Vavilov J. Genet. Breed. 2024, 28, 515–522. [Google Scholar] [CrossRef] [PubMed]
Doszhanova, B.; Zatybekov, A.; Didorenko, S.; Fang, C.; Abugalieva, S.; Turuspekov, Y. Genome-wide association study of seed quality and yield traits in a soybean collection from Southeast Kazakhstan. Agronomy 2024, 14, 2746. [Google Scholar] [CrossRef]
Zatybekov, A.; Genievskaya, Y.; Anuarbek, S.; Varshney, R.K.; Barmukh, R.; Kudaibergenov, M.; Turuspekov, Y.; Abugalieva, S. GWAS for QTLs associated with agronomic traits in Chickpea (Cicer arietinum L.) harvested in South-East Kazakhstan. BMC Plant Biol. 2026, 26, 42. [Google Scholar] [CrossRef]
Tuberosa, R. Phenotyping for drought tolerance of crops in the genomics era. Front. Physiol. 2012, 3, 347. [Google Scholar] [CrossRef]
Gali, K.K.; Sackville, A.; Tafesse, E.G.; Lachagari, V.B.R.; McPhee, K.; Hybl, M.; Muehlbauer, F.J.; Barbetti, M.J. Genome-wide association mapping for agronomic and seed quality traits in field pea (Pisum sativum L.). Front. Plant Sci. 2019, 10, 1538. [Google Scholar] [CrossRef]
Klein, A.; Houtin, H.; Rond, C.; Marget, P.; Jacquin, F.; Boucherot, K.; Huart, M.; Rivière, N.; Boutet, G.; Lejeune-Hénaut, I.; et al. QTL analysis of frost damage in pea suggests different mechanisms involved in frost tolerance. Theor. Appl. Genet. 2014, 127, 1319–1330. [Google Scholar] [CrossRef]
Zhang, C.; McGee, R.J.; Vandemark, G.J.; Sankaran, S. Crop performance evaluation of chickpea and dry pea breeding lines across seasons and locations using phenomics data. Front. Plant Sci. 2021, 12, 640259. [Google Scholar] [CrossRef]
Weller, J.L.; Ortega, R. Genetic control of flowering time in legumes. Front. Plant Sci. 2015, 6, 207. [Google Scholar] [CrossRef] [PubMed]
Tyagi, N.; Singh, A.K.; Rai, V.P.; Kumar, S.; Srivastava, C.P. Genetic variability studies for lodging resistance and yield attributes in pea (Pisum sativum L.). J. Food Legumes 2012, 25, 179–182. [Google Scholar]
Singh, K.P.; Singh, H.C.; Verma, M.C. Genetic analysis for yield and yield traits in pea. J. Food Legumes 2010, 23, 113–116. [Google Scholar]
Warkentin, T.D.; Smýkal, P.; Xu, P.; McPhee, K. Advances in pea breeding and genomics. Front. Plant Sci. 2024, 15, 1430421. [Google Scholar] [CrossRef] [PubMed]
Nayyar, H.; Bains, T.; Kumar, S. Low temperature induced floral abortion in chickpea: Relationship to abscisic acid and cryoprotectants in reproductive organs. Environ. Exp. Bot. 2006, 57, 114–122. [Google Scholar] [CrossRef]
Sadras, V.O. Evolutionary aspects of the trade-off between seed size and number in crops. Field Crops Res. 2007, 100, 125–138. [Google Scholar] [CrossRef]
Kreplak, J.; Madoui, M.A.; Cápal, P.; Novák, P.; Labadie, K.; Aubert, G.; Bayer, P.E.; Bayer, P.E.; Gali, K.K.; Syme, R.A.; et al. A reference genome for pea provides insight into legume genome evolution. Nat. Genet. 2019, 51, 1411–1422. [Google Scholar] [CrossRef]
Tayeh, N.; Klein, A.; Le Paslier, M.C.; Jacquin, F.; Houtin, H.; Rond, C.; Chabert-Martinello, M.; Magnin-Robert, J.-B.; Marget, P.; Aubert, G.; et al. Genomic prediction in pea: Effect of marker density and training population size and composition on prediction accuracy. Front. Plant Sci. 2015, 6, 941. [Google Scholar] [CrossRef]
Boutet, G.; Carvalho, S.A.; Falque, M.; Peterlongo, P.; Lhuillier, E.; Bouchez, O.; Lavaud, C.; Pilet-Nayel, M.L. SNP discovery and genetic mapping using genotyping by sequencing of whole genome genomic DNA from a pea RIL population. BMC Genom. 2016, 17, 121. [Google Scholar] [CrossRef]
Tibbs Cortes, L.; Zhang, Z.; Yu, J. Status and prospects of genome-wide association studies in plants. Plant Genome 2021, 14, e20077. [Google Scholar] [CrossRef]
Korte, A.; Farlow, A. The advantages and limitations of trait analysis with GWAS: A review. Plant Methods 2013, 9, 29. [Google Scholar] [CrossRef]
Ballén-Taborda, C.; Johnson, N.; Boatwright, L.; Lawrence, T.; Kay, J.; Windsor, N.; Madurapperumage, A.; Thavarajah, P.; Tang, L.; Shipe, E.; et al. Genome-wide association studies of nutritional traits in peas (Pisum sativum L.) for biofortification. Plant Genome 2025, 18, e70135. [Google Scholar] [CrossRef]
Martins, L.B.; Balint-Kurti, P.; Reberg-Horton, S.C. Genome-wide association study for morphological traits and resistance to Peryonella pinodes in the USDA pea single plant plus collection. G3 Genes Genomes Genet. 2022, 12, jkac168. [Google Scholar] [CrossRef] [PubMed]
Osuna-Caballero, S.; Rubiales, D.; Rispail, N. Genome-wide association study uncovers pea candidate genes and metabolic pathways involved in rust resistance. Plant Genome 2024, 17, e20510. [Google Scholar] [CrossRef] [PubMed]
Mascher, M.; Schreiber, M.; Scholz, U.; Graner, A.; Reif, J.C.; Stein, N. Genebank genomics bridges the gap between the conservation of crop diversity and plant breeding. Nat. Genet. 2019, 51, 1076–1081. [Google Scholar] [CrossRef]
Smith, A.B.; Cullis, B.R.; Thompson, R. The analysis of crop cultivar breeding and evaluation trials: An overview of current mixed model approaches. J. Agric. Sci. 2005, 143, 449–462. [Google Scholar] [CrossRef]
Meier, U. Growth Stages of Mono- and Dicotyledonous Plants; Julius Kühn-Institut: Quedlinburg, Germany, 2018. [Google Scholar]
Bates, D.; Mächler, M.; Bolker, B.; Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 2015, 67, 1–48. [Google Scholar] [CrossRef]
Fisher, R.A. Statistical methods for research workers. In Breakthroughs in Statistics: Methodology and Distribution; Springer: New York, NY, USA, 1970; pp. 66–70. [Google Scholar]
Holland, J.B.; Nyquist, W.E.; Cervantes-Martínez, C.T.; Janick, J. Estimating and interpreting heritability for plant breeding: An update. In Plant Breeding Reviews; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2003; Volume 22. [Google Scholar] [CrossRef]
Tayeh, N.; Aluome, C.; Falque, M.; Jacquin, F.; Klein, A.; Chauveau, A.; Bérard, A.; Houtin, H.; Rond, C.; Kreplak, J.; et al. Development of two major resources for pea genomics: The GenoPea 13.2 K SNP Array and a high-density, high-resolution consensus genetic map. Plant J. 2015, 84, 1257–1273. [Google Scholar] [CrossRef]
Zheng, X.; Levine, D.; Shen, J.; Gogarten, S.M.; Laurie, C.; Weir, B.S. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 2012, 28, 3326–3328. [Google Scholar] [CrossRef]
Saitou, N.; Nei, M. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 1987, 4, 406–425. [Google Scholar] [CrossRef]
Letunic, I.; Bork, P. Interactive Tree of Life (iTOL) v6: Recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 2024, 52, W78–W82. [Google Scholar] [CrossRef]
Evanno, G.; Regnaut, S.; Goudet, J. Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Mol. Ecol. 2005, 14, 2611–2620. [Google Scholar] [CrossRef]
Bradbury, P.J.; Zhang, Z.; Kroon, D.E.; Casstevens, T.M.; Ramdoss, Y.; Buckler, E.S. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 2007, 23, 2633–2635. [Google Scholar] [CrossRef]
Wickham, H. Data analysis. In ggplot2: Elegant Graphics for Data Analysis; Springer International Publishing: Cham, Switzerland, 2016; pp. 189–201. [Google Scholar] [CrossRef]
Yu, J.; Pressoir, G.; Briggs, W.H.; Vroh Bi, I.; Yamasaki, M.; Doebley, J.F.; McMullen, M.D.; Gaut, B.S.; Nielsen, D.M.; Holland, J.B.; et al. Mixed-model method for association mapping. Nat. Genet. 2006, 38, 203–208. [Google Scholar] [CrossRef]
Zhang, Z.; Ersoz, E.; Lai, C.Q.; Todhunter, R.J.; Tiwari, H.K.; Gore, M.A.; Bradbury, P.J.; Yu, J.; Arnett, D.K.; Ordovas, J.M.; et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 2010, 42, 355–360. [Google Scholar] [CrossRef] [PubMed]
Yin, L.; Zhang, H.; Tang, Z.; Xu, J.; Yin, D.; Zhang, Z.; Yuan, X.; Zhu, M.; Zhao, S.; Li, X.; et al. rMVP: A memory-efficient, visualization-enhanced, and parallel-accelerated tool for genome-wide association study. Genom. Proteom. Bioinform. 2021, 19, 619–628. [Google Scholar] [CrossRef]
Voorrips, R. MapChart: Software for the graphical presentation of linkage maps and QTLs. J. Hered. 2002, 93, 77–78. [Google Scholar] [CrossRef]
Tang, D.; Chen, M.; Huang, X.; Zhang, G.; Zeng, L.; Zhang, G.; Wang, Y. SRplot platform. PLoS ONE 2023, 18, e0294236. [Google Scholar] [CrossRef]
Carlson-Nilsson, U.; Aloisi, K.; Vågen, I.M.; Rajala, A.; Mølmann, J.B.; Rasmussen, S.K.; Niemi, M.; Wojciechowska, E.; Pärssinen, P.; Poulsen, G.; et al. Trait expression and environmental responses of pea (Pisum sativum L.) genetic resources targeting cultivation in the Arctic. Front. Plant Sci. 2021, 12, 688067. [Google Scholar] [CrossRef] [PubMed]
Kebede, G.Y.; Eritro, T.A.; Gutu, D.T. Genotype× environment interaction and stability analysis of advanced field pea (Pisum sativum L.) genotypes in Southeastern Ethiopia. Ecol. Genet. Genom. 2024, 33, 100302. [Google Scholar] [CrossRef]
Dehghani, H.; Ebadi-Segherloo, A.; Sabaghpour, S.H.; Roostaei, M. Study of Genotype × Environment Interaction for Chickpea Yield in Iran. Agron. J. 2010, 102, 1–8. [Google Scholar] [CrossRef]
Aman, F.; Ara, N.; Shah, S.M.A. Genetic diversity among pea (Pisum sativum L.) genotypes for maturity and yield traits. Sarhad J. Agric. 2021, 37, 386–397. [Google Scholar] [CrossRef]
Janusauskaite, D. Productivity of three pea (Pisum sativum L.) varieties as influenced by nutrient supply and meteorological conditions in boreal environmental zone. Plants 2023, 12, 1938. [Google Scholar] [CrossRef] [PubMed]
Kosev, V.; Vasileva, V. Comparative biological characteristics of pea (Pisum sativum) varieties. Indian. J. Agric. Sci. 2021, 91, 1280–1284. [Google Scholar] [CrossRef]
Rana, J.C.; Rana, M.; Sharma, V.; Nag, A.; Chahota, R.K.; Sharma, T.R. Genetic diversity and structure of pea (Pisum sativum L.) germplasm based on morphological and SSR markers. Plant Mol. Biol. Report. 2017, 35, 118–129. [Google Scholar] [CrossRef]
Hu, C.; Cun, J.; Soliman, A.A.; Yang, F.; Ghareeb, Z.E.; Yuan, X.; Yang, T.; Wang, X.; Zhang, J.; Xiang, C.; et al. Agronomic performance and yield stability of field pea (Pisum sativum L.) genotypes in multi-environment trials. BMC Plant Biol. 2025, 25, 1670. [Google Scholar] [CrossRef]
Smýkal, P.; Aubert, G.; Burstin, J.; Coyne, C.J.; Ellis, N.T.; Flavell, A.J.; Ford, R.; Hýbl, M.; Macas, J.; Neumann, P.; et al. Pea (Pisum sativum L.) in the genomic era. Agronomy 2012, 2, 74–115. [Google Scholar] [CrossRef]
Brhane, H.; Hammenhag, C. Genetic diversity and population structure analysis of a diverse panel of pea (Pisum sativum). Front. Genet. 2024, 15, 1396888. [Google Scholar] [CrossRef]
Burstin, J.; Salloignon, P.; Chabert-Martinello, M.; Magnin-Robert, J.B.; Siol, M.; Jacquin, F.; Chauveau, A.; Pont, C.; Aubert, G.; Delaitre, C.; et al. Genetic diversity and trait genomic prediction in a pea diversity panel. BMC Genom. 2015, 16, 105. [Google Scholar] [CrossRef]
Siol, M.; Jacquin, F.; Chabert-Martinello, M.; Smýkal, P.; Le Paslier, M.C.; Aubert, G.; Burstin, J. Patterns of genetic structure and linkage disequilibrium in a large collection of pea germplasm. G3 Genes Genomes Genet. 2017, 7, 2461–2471. [Google Scholar] [CrossRef] [PubMed]
Pavan, S.; Delvento, C.; Nazzicari, N.; Ferrari, B.; D’Agostino, N.; Taranto, F.; Lotti, C.; Ricciardi, L.; Annicchiarico, P. Merging genotyping-by-sequencing data from two ex situ collections provides insights on the pea evolutionary history. Hortic. Res. 2022, 9, uhab062. [Google Scholar] [CrossRef]
Jing, R.; Vershinin, A.; Grzebyta, J.; Shaw, P.; Smýkal, P.; Marshall, D.; Ambrose, M.J.; Ellis, T.N.; Flavell, A.J. The genetic diversity and evolution of field pea (Pisum) studied by high throughput retrotransposon-based insertion polymorphism (RBIP) marker analysis. BMC Evol. Biol. 2010, 10, 44. [Google Scholar] [CrossRef] [PubMed]
Annicchiarico, P.; Nazzicari, N.; Pecetti, L.; Romani, M.; Russi, L. Pea genomic selection for Italian environments. BMC Genom. 2019, 20, 603. [Google Scholar] [CrossRef] [PubMed]
Lester, D.R.; Ross, J.J.; Davies, P.J.; Reid, J.B. Mendel’s stem length gene (Le) encodes a gibberellin 3 beta-hydroxylase. Plant Cell 1997, 9, 1435–1443. [Google Scholar] [CrossRef]
Hecht, V.; Knowles, C.L.; Vander Schoor, J.K.; Liew, L.C.; Jones, S.E.; Lambert, M.J.; Weller, J.L. Pea LATE BLOOMER1 is a GIGANTEA ortholog with roles in photoperiodic flowering, deetiolation, and transcriptional regulation of circadian clock gene homologs. Plant Physiol. 2007, 144, 648–661. [Google Scholar] [CrossRef]

Figure 1. Phenotypic distribution and correlation analysis of pea collection. VER2—flowering time; VER8—maturity time; PH—plant height; HLAP—height of the lowest pod attachment; NPP—number of pods per plant; NSPP—number of seeds per pod; TSW—thousand seed weight; Ypm2—yield per m2; ***—p < 0.001; **—p < 0.05; *—p < 0.01.

Figure 2. Genetic diversity and population structure of pea accessions. (A) Principal Component Analysis (PCA) plot, showing the distribution of accessions colored by geographic origin group. (B) Phylogenetic tree illustrating genetic clustering of accessions by geographic origin group. (C) Phylogenetic tree illustrating genetic clustering of accessions by leaf type within the collection.

Figure 3. Population structure inference in pea collection. (A) ΔK plot for determining the optimal K value. (B) Pairwise kinship heatmap with hierarchical clustering. (C) Structure bar plot at K = 2 showing ancestry proportions.

Figure 4. SNP marker density (A) and linkage disequilibrium decay (B) across the Pisum sativum genome.

Figure 5. Physical positions of identified stable QTLs across the pea genome.

Table 1. Variability of key agronomic traits in the pea collection.

Trait	Min	Max	Mean	SD	CV (%)
VER2—flowering time, days	31.67	41.00	36.08	2.23	6.17
VER8—maturity time, days	72.28	85.72	79.19	3.00	3.79
PH—plant height, cm	33.44	134.44	72.02	17.68	24.54
HLAP—height of the lowest pod attachment, cm	18.96	75.01	43.61	11.11	25.48
NPP—number of pods per plant, count.	4.08	16.41	7.45	1.95	26.14
NSPP—number of seeds per pod, count.	3.67	7.67	5.49	0.62	11.28
TSW—thousand seed weight, g	121.84	347.89	202.43	30.90	15.26
Ypm²—yield per m², g	100.18	330.05	228.34	37.43	16.39

Notes: SD—standard deviation; CV—coefficient of variation.

Table 2. ANOVA results and broad-sense heritability (H²) for evaluated traits.

Factors	Df	SS	MS	F-Value	p-Value	H²
PH
Genotype	183	306,241	1673	10.51	<2 ×10⁻¹⁶	0.925
Environment	1	7212	7212	45.298	3.29 × 10⁻¹¹
Genotype × Environment	169	58,305	345	2.167	1.52 × 10⁻¹²
Residuals	774	123,239	159
HLAP
Genotype	183	124,371	679.6	9.7	<2 × 10⁻¹⁶	0.889
Environment	1	1363	1362.7	19.449	1.18 × 10⁻⁵
Genotype × Environment	169	37,221	220.2	3.143	<2 × 10⁻¹⁶
Residuals	774	54,231	70.1
NPP
Genotype	183	4135	22.6	2.417	<2 × 10⁻¹⁶	0.759
Environment	1	527	526.9	56.359	1.66 × 10⁻¹³
Genotype × Environment	169	1849	10.9	1.17	0.0878
Residuals	774	7236	9.3
NSPP
Genotype	183	408.5	2.2	3.297	<2 × 10⁻¹⁶	0.712
Environment	1	416.5	416.5	615.366	<2 × 10⁻¹⁶
Genotype × Environment	169	336.6	2	2.943	<2 × 10⁻¹⁶
Residuals	774	523.9	0.7
TSW
Genotype	183	915,496	5003	7.098	<2 × 10⁻¹⁶	0.883
Environment	1	16,958	16,958	24.06	1.14 × 10⁻⁶
Genotype × Environment	169	259,189	1534	2.176	1.17 × 10⁻¹²
Residuals	770	542,702	705
Ypm²
Genotype	183	1,367,936	7475	1.294	0.011	0.672
Environment	1	46,399	46,399	8.029	0.005
Genotype × Environment	169	1,392,274	8238	1.426	0.001
Residuals	774	4,472,784	5779

Notes: PH—plant height; HLAP—height of the lowest pod attachment; NPP—number of pods per plant; NSPP—number of seeds per pod; TSW—thousand seed weight; Ypm2—yield per m2; df—degree of freedom; SS—sum of squares; MS—mean of squares; H²—broad sense heritability.

Table 3. Identified stable QTLs associated with agronomic traits.

Trait	QTL	SNP	Chr	Position	Interval	p Value	Allele	Effect
PH	q.PH.5-1	PsCam037922_22979_691	5	639,901,919	636,138,355–639,901,919	6.21 × 10⁻¹²	T/C	−19.27
HLAP	q.HLAP.2-1	PsCam027153_15841_364	2	481,039,072	481,022,136–481,708,104	1.53 × 10⁻⁴	T/C	3.04
HLAP	q.HLAP.4-1	PsCam035766_20935_1194	4	229,356,151	229,356,151–233,353,825	2.98 × 10⁻⁴	T/C	2.69
HLAP	q.HLAP.4-2	PsCam014564_9838_294	4	423,140,921	422,688,134–423,140,921	1.26 × 10⁻⁴	A/G	−3.63
HLAP	q.HLAP.5-1	PsCam048465_31196_929	5	268,614,680	268,614,680–272,795,731	5.15 × 10⁻⁴	A/G	−4.59
HLAP	q.HLAP.5-2	PsCam037922_22979_691	5	639,901,919	636,138,355–643,508,245	1.85 × 10⁻⁷	T/C	−6.44
HLAP	q.HLAP.6-1	PsCam044890_28633_1016	6	339,117,378	339,117,378–341,865,789	7.17 × 10⁻⁴	T/C	4.19
NPP	q.NPP.3-1	PsCam053922_35659_106	3	288,834,651	288,832,606–288,834,651	7.73 × 10⁻⁷	A/G	1.88
NPP	q.NPP.5-1	PsCam004672_3514_1959	5	92,804,639	92,804,639	3.41 × 10⁻⁷	A/C	1.12
NSPP	q.NSPP.1–1	PsCam058842_39169_458	1	389,168,226	380,937,045–389,420,594	5.11 × 10⁻⁷	A/G	0.22
NSPP	q.NSPP.2-1	PsCam038676_23690_1430	2	113,478,831	112,717,076–115,095,368	4.00 × 10⁻⁵	A/C	−0.30
NSPP	q.NSPP.5-1	PsCam050232_32827_635	5	652,515,285	652,515,285	5.27 × 10⁻⁴	T/C	0.30
TSW	q.TSW.1-1	PsCam049395_32031_851	1	298,574,569	298,574,569–308,608,527	6.39 × 10⁻⁴	A/G	−17.31
TSW	q.TSW.2-1	PsCam053957_35679_340	2	1,606,863	1,531,745–1,606,863	1.12 × 10⁻⁵	T/C	14.36
TSW	q.TSW.2-2	PsCam008331_5911_76	2	24,434,716	22,229,061–25,729,249	4.32 × 10⁻⁴	A/G	−9.69
TSW	q.TSW.3-2	PsCam034487_19873_569	3	349,765,949	342,755,631–349,765,949	4.51 × 10⁻⁴	A/G	15.32
TSW	q.TSW.4-1	PsCam045098_28821_3331	4	213,098,673	210,832,276–213,100,347	2.46 × 10⁻⁴	T/C	17.27
TSW	q.TSW.4-2	PsCam014067_9588_1387	4	266,499,269	266,499,269–271,428,148	2.38 × 10⁻⁷	A/G	24.62
Ypm²	q.Ypm2.7-1	PsCam057800_38347_588	7	60,678,801	60,678,801–62,232,668	1.37 × 10⁻⁴	T/C	20.56

Notes: PH—plant height; HLAP—height of the lowest pod attachment; NPP—number of pods per plant; NSPP—number of seeds per pod; TSW—thousand seed weight; Ypm2—yield per m2; Chr—chromosome.

Table 4. Candidate genes and their predicted functions within identified stable QTL regions.

QTL	Gene	Molecular Function	Biological Process	Cellular Component
q.PH.5-1	Psat5g299720 (LE)	Gibberellin 3-beta-dioxygenase activity	Gibberellin biosynthetic process	-
q.HLAP.2-1	Psat2g178840	SNARE binding; SNAP receptor activity	Intracellular protein transport; Exocytosis, vesicle fusion; Vesicle-mediated transport; Vesicle docking; Membrane fusion	Plasma membrane; Endomembrane system; Membrane; SNARE complex
q.HLAP.4-1	Psat4g102480	-	-	Membrane
q.HLAP.4-2	Psat4g150760	Oxidoreductase activity; Acting on the CH-OH group of donors; NAD or NADP as acceptor; NAD binding	Obsolete oxidation-reduction process	-
q.HLAP.5-1	Psat5g121040	-	-	Plasmodesma
q.HLAP.5-2	Psat5g299720	Oxidoreductase activity	Obsolete oxidation-reduction process	-
q.HLAP.6-1	PHYA	Phosphorelay sensor kinase activity; Photoreceptor activity; Protein homodimerization activity	Phosphorelay signal transduction system; regulation of DNA-templated transcription; signal transduction; detection of visible light; red, far-red light phototransduction; protein-tetrapyrrole linkage	-
q.NPP.3-1	Psat3g110240	Catalytic activity; Protein binding	-	-
q.NPP.5-1	-	-	-	-
q.NSPP.1-1	-	-	-	-
q.NSPP.2-1	Psat2g046240	Catalytic activity; Glutamine synthetase activity	-	Cytoplasm
q.NSPP.5-1	Psat5g306040	Double-stranded DNA binding	Regulation of DNA-templated transcription	-
q.TSW.1-1	-	-	-	-
q.TSW.2-1	-	-	-	-
q.TSW.2-2	-	-	-	-
q.TSW.3-2	Psat3g137280	Copper ion binding; Oxidoreductase activity; Hydroquinone: oxygen oxidoreductase activity	Lignin catabolic process	Apoplast
q.TSW.4-1	Psat4g094200	Ubiquitin-protein transferase activity	Protein ubiquitination	-
q.TSW.4-2	Psat4g121160	GTPase activator activity	COPI coating of Golgi vesicle	Golgi membrane
q.Ypm2.7-1	Psat7g031840	Clathrin light chain binding; Structural molecule activity	Intracellular protein transport; Receptor-mediated endocytosis	Cytoplasmic vesicle membrane

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zatybekov, A.; Ten, E.; Oshergina, I.; Radul, S.; Amalova, A.; Abugalieva, S.; Turuspekov, Y. Uncovering Stable Genetic Loci for Sustainable Pea (Pisum sativum L.) Production Through Genome-Wide Association Mapping. Agronomy 2026, 16, 934. https://doi.org/10.3390/agronomy16090934

AMA Style

Zatybekov A, Ten E, Oshergina I, Radul S, Amalova A, Abugalieva S, Turuspekov Y. Uncovering Stable Genetic Loci for Sustainable Pea (Pisum sativum L.) Production Through Genome-Wide Association Mapping. Agronomy. 2026; 16(9):934. https://doi.org/10.3390/agronomy16090934

Chicago/Turabian Style

Zatybekov, Alibek, Evgeniy Ten, Irina Oshergina, Sergey Radul, Akerke Amalova, Saule Abugalieva, and Yerlan Turuspekov. 2026. "Uncovering Stable Genetic Loci for Sustainable Pea (Pisum sativum L.) Production Through Genome-Wide Association Mapping" Agronomy 16, no. 9: 934. https://doi.org/10.3390/agronomy16090934

APA Style

Zatybekov, A., Ten, E., Oshergina, I., Radul, S., Amalova, A., Abugalieva, S., & Turuspekov, Y. (2026). Uncovering Stable Genetic Loci for Sustainable Pea (Pisum sativum L.) Production Through Genome-Wide Association Mapping. Agronomy, 16(9), 934. https://doi.org/10.3390/agronomy16090934

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Uncovering Stable Genetic Loci for Sustainable Pea (Pisum sativum L.) Production Through Genome-Wide Association Mapping

Abstract

1. Introduction

2. Materials and Methods

2.1. Plant Material and Field Experiments

2.2. Statistical Analysis

2.3. Genotyping and SNP Dataset

2.4. Population Structure and Genetic Diversity

2.5. Association Mapping

3. Results

3.1. Phenotypic Variability, Correlation, and Heritability of Agronomic Traits

3.2. Population Analysis and Genetic Relationship

3.3. Identification of Stable QTLs and Their Functional Candidate Genes

4. Discussion

4.1. Phenotypic Variability, Correlation, and Heritability of Agronomic Traits

4.2. Population Analysis and Genetic Relationship

4.3. Identification of Stable QTLs and Their Functional Candidate Genes

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI