1. Introduction
Wheat (
Triticum spp.) is a cornerstone of global food security and constitutes the most extensively cultivated cereal globally, providing approximately 55% of the world’s carbohydrate intake and 21% of its food calories [
1], yet its value chains hinge on two biologically and technologically distinct species: durum and common (bread) wheat [
2,
3].
Durum wheat (
Triticum durum Desf.; AABB, 2n = 4x = 28) produces very hard, vitreous kernels that mill into yellow-amber semolina rich in carotenoids, delivering strong but relatively inelastic gluten ideal for pasta and couscous [
4,
5]. Agronomically, durum wheat is traditionally favored in Mediterranean and other semi-arid systems, where it adapts well to heat and limited water availability. In regions facing heat and water constraints, identifying durum genotypes that combine competitive yield with stable semolina quality is therefore a strategic breeding priority [
6,
7]. In contrast, Common wheat (
T. aestivum L.; AABBDD, 2n = 6x = 42) spans soft–hard classes, is milled chiefly into fine flour, and forms extensible gluten networks suited to leavened breads [
1,
8].
Despite South America’s agricultural capacity, a marked supply–demand gap persists due to the internal demand for grains [
9]. In particular, Peru harvested barely 0.20 Mt of mostly soft wheat in 2024/2025 yet imported more than 2 Mt to satisfy domestic demand, leaving local farmers with less than a 20% market share and exposing consumers to external price volatility [
10].
In recent decades, wheat yields in Peru have shown an upward trend, particularly in highland regions such as Arequipa, La Libertad, Junín, Cusco, and Ayacucho. In particular, Arequipa has consistently achieved the highest productivity, reaching a national record of 7.12 t ha
−1 in 2013, which is nearly three times higher than the maximum yield reported in Junín (2.4 t ha
−1 in 2015). However, by 2021, Arequipa and Junín accounted for only 6.4% and 3.3% of total national production, respectively, due to their limited harvested area (1.7% and 2.4%). Expanding the agricultural frontier in the coastal valleys, where climatic conditions are more favorable, could enhance national self-sufficiency if the mean yield of 6.78 t ha
−1 obtained in Arequipa between 2012 and 2021 were replicated nationwide [
10]. These trends highlight the strong influence of local adaptation and environmental factors on wheat productivity and emphasize the need to characterize genotype-by-environment (G × E) interactions to guide future breeding and yield-stability improvement efforts.
Wheat breeding has now reached a stage where local adaptation, rather than global, increasingly determines the ceiling for yield gains [
11,
12]. Experience with ostensibly “high-yield” lines shows that, once deployed outside their selection environment, performance becomes unpredictable unless genotype-by-environment (G × E) responses are first quantified in the target region [
13].
Advances such as fully annotated reference genomes, pangenomic panels and genomic-selection pipelines have accelerated allele mining and parent selection [
14]; however, wheat’s complex allotetraploid architecture and the magnitude of G × E interactions still limit the translation of genomic predictions into consistent field gains [
15]. The crop’s broad genetic base underpins notable phenotypic plasticity, buffering yield across thermally and hydrologically heterogeneous landscapes [
16].
Recent multi-actor, multi-environment trials confirm that advanced lines tested with farmers across contrasting semi-arid sites can expose genotypes that combine high mean yield with stability, while simultaneously generating agronomic information valued by end-users [
17]. In Peru, where local durum wheat production remains limited and import dependence persists, identifying genotypes capable of maintaining yield stability across environments is a national priority. The introduction of locally adapted cultivars not only reduces dependence on imports but also helps conserve in situ genetic diversity, contributing to resilient food systems under the framework of the Convention on Biological Diversity [
18,
19].
To address this challenge, we implemented integrative stability analyses using the additive main effects and multiplicative interaction (AMMI), the genotype plus genotype-by-environment (GGE) biplot, and the weighted average of absolute scores (WAASBY) and AMMI stability value (ASV) models, which together provide complementary insights into yield performance, stability, and adaptability [
8,
12,
17,
19,
20,
21]. These approaches allow for the identification of both broadly and specifically adapted lines under semi-arid Peruvian conditions.
In wheat, AMMI models typically attribute more than 70% of the variance to environmental and interaction effects, efficiently distinguishing broadly adapted lines from those expressing specific adaptation [
17,
20,
22,
23]. Moreover, other studies reporting multi-environment trial (MET) syntheses in wheat indicate that environments still explain more than 40% of the total variation in grain yield, while genotypes and the G × E term contribute approximately 6–21% and 8–23%, respectively, figures that underscore the importance of interaction-focused statistical approaches [
24]. In Eastern Europe, Jędzura et al. [
25] reported that environmental main effects explained 89% of variation across six spring wheat locations, with the AMMI Stability Value (ASV) and Yield-Stability Index identifying STH 21-09 as the most widely adapted line.
Against this background, the present study evaluates eleven T. durum lines alongside the commercial variety ‘INIA 412 Atahualpa’ across three representative semi-arid sites in Arequipa, Peru. The findings not only advance Peru’s efforts toward wheat self-sufficiency but also exemplify how integrative G × E analytics can enhance breeding decisions in environments increasingly challenged by climate variability.
3. Results
3.1. Grain Yield (t ha−1)
The AMMI-based analysis of variance revealed highly significant effects for both the genotype main factor (GEN;
F = 2.85,
p = 0.004) and the genotype × environment interaction (G × E;
F = 1.80,
p = 0.035). In contrast, differences among environments were not significant (ENV;
F = 1.43,
p = 0.309), indicating that the differential response of genotypes across sites was the principal source of variation in grain yield. The first principal component (PC1) alone accounted for 55.2% of the interaction variance and was marginally significant (
p = 0.062), while the second principal component (PC2) explained the remaining 44.8% and was not significant (
p = 0.083) (
Table S3).
The AMMI1 biplot for grain yield explained 55.2% of the total G × E variation, with PC1 capturing the major interaction patterns among genotypes and environments. Across environments, observed yields ranged from 2.09 t ha−1 (TD-062) to 3.19 t ha−1 (TD-033), highlighting the broad phenotypic range among durum wheat lines. In particular, TD-033 combines high performance and stability, with small residuals across the locations (≤±0.4 t ha−1 g).
Genotypes TD-037 (3.03 t ha−1) and TD-033 (3.19 t ha−1) recorded the highest mean grain yields, consistent with the highly significant genotype effect detected in the AMMI analysis (p = 0.004), indicating strong performance and responsiveness to favorable environments. Both genotypes exhibited small residuals (≤±0.8 t ha−1). Conversely, TD-062 (2.09 t ha−1) and TD-061 (2.20 t ha−1) had maximum negative residuals, with −0.55 t ha−1 and −0.48 t ha−1, respectively.
Grain yield for Santa Elena showed positive PC1 scores, indicating that this environment contributed favorably to grain yield expression. Genotypes located near Santa Elena such as TD-061, TD-053, TD-020, and TD-037 displayed values ranging from 2.20 to 3.03 t ha−1, demonstrating a strong genotype–environment interaction affinity.
Genotypes close to San Francisco de Paula with negative PC1 scores revealed low values for grain yield, ranging from 2.09 t ha−1 (TD-062), 2.41 t ha−1 (TD-043) and 2.43 t ha−1 (TD-044), and moderate residuals (≤±0.6 t ha−1), and they can be considered locally adapted candidates capable of maintaining high grain yield under restrictive environmental conditions.
Finally, Santa Rita exhibited slightly negative PC1 scores, clustering genotypes such as TD-001 and TD-030 nearby. These genotypes demonstrated observed yields between 2.48 and 2.45 t ha
−1 with moderate residuals (≤±0.4 t ha
−1), confirming their adaptability and stability under moderately restrictive conditions. Consequently, these lines can be regarded as locally adapted candidates for stable grain yield performance under semi-arid stress, maintaining consistent productivity despite environmental variation (
Figure 3).
The GGE “which-won-where” biplot for grain yield (PC1 = 63.2%, PC2 = 26.8%; cumulative = 90.02% of the G + GE variation) partitioned the genotype–environment plane into five sectors. The San Francisco de Paula environment was located within the domain of vertex genotype TD-053, identifying this line as the highest-yielding and best-adapted entry under the edaphoclimatic conditions of this site. San Elena was captured by the sector represented by TD-037, indicating its specific adaptation at this location. In contrast, Santa Rita was located in the sector defined by TD-033, confirming its regional suitability and stable performance despite its negative PC1 score. The remaining sectors, defined by TD-061 and TD-062, did not include any of the tested environments. Meanwhile, genotypes located near the origin (TD-044, TD-020, and TD-001) combined intermediate grain yields with minimal interaction effects, reflecting broad, though not exceptional, adaptability across environments (
Figure 4).
Weighted Average of Absolute Scores and Mean Performance Index (WAASBY), which balances mean performance with stability, clearly discriminated the 12 genotypes (
Figure 4). Values ranged from 9.09% (TD-062) to 93.72% (TD-033). Based on WAASBY scores, genotypes above the mean threshold (47.27%) were considered statistically superior in yield–stability performance. Above-average performers (blue dots) includedTD-033, which ranked first, followed by TD-014 (84.72%), TD-026 (65.57%), TD-020 (57.67%), TD-037 (50.45%) and TD-001 (50.07%), all of which combined high grain yield with superior stability, achieving WAASBY values above 50%. Conversely, below-average performers (red dots) included TD-062 (9.09%), TD-043 (14.76%) and TD-061 (27.85%), which showed poor simultaneous performance and stability, whereas TD-053 (41.58%) and TD-044 (32.73%) displayed intermediate performance and stability (
Figure 5).
ASV values ranged from 0.03 (TD-014) to 0.83 (TD-053), while grain yield spanned 2.09 t ha
−1 (TD-062) to 3.19 t ha
−1 (TD-033). In the upper-left quadrant, TD-014 combined the lowest ASV with a yield of 2.85 t ha
−1, identifying it as the most broadly adapted and stable genotype, while TD-001 (ASV: 0.23; 2.48 t ha
−1) showed a similar but slightly less productive pattern. In the central band, TD-020 (ASV: 0.33; 2.83 t ha
−1), TD-026 (ASV: 0.48; 2.90 t ha
−1), and TD-033 (ASV: 0.40; 3.19 t ha
−1) exhibited high productivity coupled with moderate interaction scores, indicating good but not absolute stability. In the upper-right quadrant, TD-037 (ASV: 0.68; 3.04 t ha
−1) and TD-053 (ASV: 0.83; 2.68 t ha
−1) achieved high yields at the expense of stability, reflecting greater genotype × environment interaction. Conversely, in the lower-right quadrant, TD-062, TD-061, and TD-043 combined below-average yields (<2.30 t ha
−1) with ASV values exceeding 0.50, confirming their limited adaptability and poor agronomic performance (
Figure 6).
3.2. Hectoliter Weight (kg hL−1)
The AMMI-based analysis of variance revealed highly significant effects for the main factor genotype (GEN;
F = 4.30,
p = 0.0001), for the environment factor (ENV;
F = 25.73,
p = 0.001), and for the genotype × environment interaction (G × E;
F = 4.26,
p = 0.00001), indicating that the differential response of genotypes between sites was not the only source of variation in test weight. The first two multiplicative axes together explained the entire sum of squares of the interaction, with PC1 alone capturing 89.5% of the G × E variance and being highly significant (
p = 0.0001), while PC2 accounted for the remaining 10.5% and was not significant (
p = 0.461) (
Table S3).
Genotypes TD-014 (79.65 Kg hL−1) and TD-044 (80.06 Kg hL−1) exhibited the highest mean electrolyte leakage, consistent with the highly significant genotype effect detected in the AMMI analysis (p = 0.0001). Conversely, TD-043 (78.31 Kg hL−1) and TD-061 (78.33 Kg hL−1) recorded the lowest mean values. Residuals for these genotypes were close to zero, reinforcing the reliability of the AMMI-predicted values.
The environment Santa Elena showed positive PC1 scores, indicating a strong positive contribution to electrolyte leakage expression. Genotypes located near Santa Elena, such as TD-061, TD-062 and TD-053, displayed values ranging from 78.33 to 78.89 Kg hL−1. This clustering reflects a positive association between Santa Elena and higher electrolyte leakage driven by local environmental conditions.
In contrast, San Francisco de Paula exhibited negative PC1 scores, associating with genotypes such as TD-001 (77.81 Kg hL−1), which displayed larger residuals (≤±3 Kg hL−1). This positioning indicates that these genotypes are more prone to membrane damage in limited conditions.
Finally, Santa Rita occupied a central position on the biplot, with small negative PC1 values, grouping genotypes TD-043, TD-044, and TD-014. These genotypes showed minimal residuals values (≤0.8 Kg hL
−1) and can therefore be considered stable and less sensitive to electrolyte leakage fluctuations under moderately restrictive environments (
Figure 7).
The “which-won-where” GGE biplot for hectoliter weight (PC1 = 67.73%, PC2 = 25.35%; cumulative = 93.08%) divided the genotype–environment plane into four sectors (
Figure 7). In sector 1, corresponding to Santa Elena, the TD-061 genotype dominated this environmental point. Sector 2 encompassed Santa Rita, and in this environmental sector, genotypes TD-044 and TD-033 dominated, identifying them as the genotypes with the highest hectoliter weight under the soil and climate conditions at that site. Sector 3encompassed San Francisco de Paula, and this location was captured by the sector headed by TD-001, indicating specific conditions and a higher hectoliter weight of that genotype. In the remaining sectors, TD-061 vertices define sectors without an assessed environment. Genotypes located close to the origin (TD-026, TD-043, TD-020, TD-62, TD-037 and TD-053) combined intermediate hectoliter weight with minimal interaction, thus exhibiting broad, though not exceptional, adaptation (
Figure 8).
In the results obtained with the WAASBY superiority index, the genotype TD-044 achieved the highest score, positioned very close to 100%, followed by genotype TD-014 (89.39%) in terms of grain weight quality measured as hectoliter weight. The second group included genotypes TD-037 (82.96%), TD-033 (81.78%), TD-026 (76.52%), TD-030 (72.37%), and TD-020 (65.71%), with scores above 65% but below 90%. These genotypes could be selected for grain quality studies due to their moderate hectoliter weight. Genotypes TD-062 (61.86%), TD-053 (56.23%), TD-043 (53.88%), and TD-061 (42.02%) showed scores below the mean (in red). At the lowest end of the scoring spectrum was genotype TD-001 (77.81 Kg hL
−1, 0%), which exhibited poor grain quality in terms of hectoliter weight and lower stability across environments (
Figure 9).
The ASV values indicate 0.29 for genotype TD-033, with a hectoliter weight of 79.57 kg hL
−1, which is the most stable with the highest weight, followed by genotypes with a moderate hectoliter weight TD-020 (ASV: 0.22; 78.66 kg hL
−1) and TD-026 (ASV: 0.25; 79.00 kg hL
−1). These combine intermediate–high hectoliter weight and stability. Genotype TD-044 showed the highest grain hectoliter weight, with 80.06 kg hL
−1, and a moderate ASV of 0.55. Likewise, TD-014 exhibited a favourable hectoliter weight of 79.66 kg hL
−1, which indicates good grain quality. However, this genotype also recorded the highest AMMI Stability Value (ASV: 2.11). Finally, genotype TD-001 had one of the highest ASVs with 1.95 combined with the lowest hectoliter weight (77.81 kg hL
−1), indicating poor stability across environments and low grain quality (
Figure 10).
3.3. Plant Height (cm)
The AMMI-based analysis of variance (
Table S3) revealed highly significant differences among genotypes and environments for plant height. The main effects of genotype (
F = 14.35,
p < 0.0001) and environment (
F = 31.98,
p < 0.0006) were statistically significant. However, the genotype × environment (G × E) interaction was not significant (
F = 0.99,
p < 0.48), indicating that the differential response of genotypes between sites was not the main source of variation in grain yield. The first two multiplicative axes explain the sum of squares of the interactions, with PC1 accounting for 71.4% (
p = 0.2399) and PC2 for 28.6% (
p = 0.7915), both showing no significant contribution to the interaction pattern
Among the evaluated genotypes, TD-062 (118.81 cm) and TD-043 (113.73 cm) exhibited the greatest mean plant heights, consistent with the significant genotype effect detected in the AMMI analysis (p < 0.0001), indicating vigorous vegetative growth under favorable conditions. Conversely, TD-030 (90.85 cm) and TD-061 (90.67 cm) displayed the lowest values, reflecting limited growth potential. Genotypes TD-026, TD-033, and TD-020 combined moderate height with small residuals (≤±3.5 cm).
The environment Santa Elena exhibited positive PC1 scores, closely associated with genotypes TD-033 and TD-062, which achieved mean values of 97.65 cm and 118.81 cm, respectively. This pattern indicates that the environmental conditions in Santa Elena favored increased plant height.
In contrast, San Francisco de Paula presented negative PC1 scores, with genotype TD-037 (99.64 cm) located nearby, suggesting restricted vegetative development under this environment. Meanwhile, Santa Rita, positioned close to genotypes TD-014 and TD-026, exhibited a more variable response, with larger residuals (≤±8.7 cm), indicating lower stability and a stronger sensitivity to environmental fluctuations (
Figure 11).
In the “which-won-where” GGE biplot for grain yield (PC1 = 87.88%, PC2 = 8.65%; Cumulative = 96.53), all three environments (Santa Elena, Santa Rita, and San Francisco de Paula) were grouped within the same sector defined by the vertex genotypes TD-062 and TD-043, indicating that these genotypes exhibited superior performance and broad adaptability under the prevailing edaphoclimatic conditions. In particular, TD-062 was associated with Santa Elena and Santa Rita, confirming its high responsiveness and favorable growth in these environments, whereas TD-043 aligned more closely with San Francisco de Paula.
The remaining vertex genotypes—TD-061, TD-030, TD-001, and TD-037—defined their own sectors without any associated environments, indicating that these accessions did not perform best in any particular location and likely possessed lower or unstable adaptability across sites. Genotypes TD-033, TD-026, TD-020, and TD-053, located near the origin of the biplot, combined intermediate plant height with minimal G × E interaction, reflecting high stability and consistent performance across environments. (
Figure 12).
WAASBY values for plant height ranged from 21% (TD-037) to approximately 95% (TD-062). A threshold set at the overall average (approximately 50%) divided the genotypes into two groups: The first was genotypes with above-average performance (blue dots). TD-062 ranked first, followed by TD-043 and TD-033. These genotypes combined high performance with low WAASBY scores, resulting in WAASBY values above 50%. The second was genotypes with below-average performance (red dots). TD-037 and TD-014 showed simultaneous low height and stability (<25%), while TD-026, TD-001, TD-061, and TD-044 showed intermediate height (approximately 40–50%). The monotonic increase from TD-030 to TD-020 denotes a continuum rather than discrete classes, allowing the breeder to choose breakpoints tailored to specific risk levels (
Figure 13).
The ASV biplot assessed genotypic stability independently of mean yield. ASV values spanned nearly an order of magnitude, from 0.40 (TD-044) to 3.13 (TD-014), while height ranged from 90.67 cm (TD-061) to 118.82 cm (TD-062). High stability was observed with the lowest height (lower left quadrant). TD-061 achieved the lowest ASV (0.49) at 90.67 cm, marking it as the most adapted genotype. TD-001 (ASV: 2.24; 92.25 cm) followed a similar pattern, but with lower stability. High stability and lower height were observed (lower left quadrant). TD-020 (ASV:0.49; 95.41 cm), TD-044 (ASV: 0.40; 93.35 cm), TD-033, TD-053 (ASV: 1.02; 93.51 cm), and TD-026 (ASV: 1.17; 96.09 cm) offered desirable height and high stability; however, this may have impacted performance. Medium stability and higher height (upper middle quadrant) were observed; TD-043 (ASV: 1.49; 113.73 cm) and TD-062 (ASV: 1.80; 118.81 cm) had low height and lower stability (lower right quadrant). TD-001, TD-037 (ASV: 2.54; 99.65 cm), TD-030 (ASV: 2.62; 90.85 cm) and TD-1 (ASV: 2.24; 92.25 cm) combined below-average heights with high ASV, confirming their poor agronomic prospects (
Figure 14).
3.4. Thousand-Kernel Weight (g)
The combined ANOVA for wheat thousand-kernel weight (TKW) showed that genotype was by far the principal source of variation, accounting for 14.57% of the total sum of squares, whereas environment explained only 26.95%. The main effects of genotype (
F = 6.68,
p < 0.0297) and environment (
F = 70.15,
p < 0.0001) were statistically significant. However, the G × E interaction term was highly significant (
F = 2.93,
p < 0.0003) and contributed an additional 18.20% to the variance, thereby justifying a multivariate interrogation using the AMMI framework. The first interaction principal component (PC1) captured 90% of the interaction sum of squares with high significance (
p = 0.0001), while the second (PC2) explained only 10% and was not significant (
p = 0.77) (
Table S3).
The AMMI1 biplot for thousand-kernel weight (TKW) explained 90% of the total genotype-by-environment (G × E) variation, with the first principal component (PC1) accounting for most of the interaction effects. This high percentage indicates that the AMMI model effectively captured the main interaction structure. The observed and predicted values (Y and Ypred) showed close agreement, with residuals generally below ±4 g across genotype–environment combinations, confirming the robustness and predictive accuracy of the model.
Genotypes TD-014 (59.95 g) and TD-062 (58. 67 g) exhibited the highest mean TKW values, indicating strong performance and stability under contrasting environments. These genotypes also presented low residuals (≤±2 g), reinforcing their stability across sites. Conversely, TD-001 (51.70 g) in Santa Elena and TD-030 (51.2 g) in San Francisco de Paula had large negative residuals with −8.71 g and −4.04 g, respectively, revealing poor adaptation and high sensitivity to environmental changes.
Santa Elena exhibited positive PC1 scores, indicating that this environment contributed positively to the expression of thousand-kernel weight. Genotypes positioned near Santa Elena on the positive PC1 axis (e.g., TD-037, TD-033, TD-030, and TD-061) displayed predicted AMMI values ranging from 51.35 to 52.9 g, suggesting that these lines responded favorably to the specific environmental conditions of this site. Such positioning implies a strong genotype–environment affinity, where the climatic and edaphic factors of Santa Elena favored kernel development and grain filling.
In San Francisco de Paula, the AMMI model revealed considerable variation in thousand-kernel weight (TKW) among genotypes, with observed values ranging from 51.70 g (TD-01) to 59.95 g (TD-014). Genotypes TD-014 and TD-043 exhibited the highest observed and predicted AMMI values, indicating superior grain development under this environment. Overall, San Francisco de Paula favored genotypes that combine above-average kernel weight with positive G × E interaction, notably TD-014 and TD-043, which can be considered locally adapted candidates capable of maintaining high TKW expression under restrictive environmental conditions.
In the AMMI1 biplot (
Figure 15), Santa Rita exhibited negative PC1 scores, clustering genotypes such as TD-044, TD-020, and TD-026 nearby, confirming their adaptation to moderately restrictive conditions. These genotypes can be considered locally adapted candidates for high-TKW expression under restrictive environments, maintaining yield stability despite environmental stress.
The “Which-won-Where” for thousand-kernel weight explained 98.88% of th total genotype + genotype-by-environment (G × E) variation, with PC1 accounting for 57.5% and PC2 for 41.38%, indicating that most of variability in grain yield traits was captured by the first two components. San Francisco de Paula and Santa Rita grouped closely, showing similar discriminating ability and representativeness, whereas Santa Elena formed an independent sector, reflecting its differentiated agro-climatic conditions. Genotypes positioned at the vertices of the polygon were identified as the highest-yielding in their respective environments: TD-014 and T-062 performed best and TD-026 and TD-053 had affinity in San Francisco de Paula, TD-043 excelled in Santa Rita, and no genotypes performed the best in Santa Elena. Some vertex genotypes (TD-033, TD-61, TD-030, TD-037 and TD-01) were not associated with any environment, indicating that they did not perform best in any of the tested locations. These genotypes exhibited specific responses that were not favored under the current environmental conditions, but they might outperform others under untested or more contrasting environments. Genotypes TD-014, TD-026, and TD-062, positioned near the origin, exhibited greater stability and average performance across environments. In contrast, genotypes TD-001 and TD-043, located far from the origin, displayed specific adaptability and higher sensitivity to environmental variation (
Figure 16).
In the WAASBY analysis, TD-014 tops the list with 92.77%, evidencing its capacity to combine high thousand-kernel weight (TKW) with stability; TD-062 shows a similar pattern with 89.43%. These results corroborate the AMMI and GGE findings. Also above the 50% threshold are lines TD-044 and TD-043, with scores of 65.79 and 65.31, respectively. By contrast, the lower (red) tail is led by TD-001 (INIA Atahualpa), whose score of 2.07% indicates very low TKW (
Figure 17).
Standing out in the lower-left quadrant is line TD-053 (ASV: 0.79; 56.26 g), indicating reliable performance with stable grain weight. In the upper-left quadrant, TD-014 (ASV: 1.69; 59.96 g) shows the highest thousand-kernel weight with low ASV. Conversely, the upper-right quadrant features TD-062 (ASV: 2.88; 58.67 g) with a high TKW but an elevated ASV, signaling instability. Finally, in the lower-right quadrant, the check variety TD-001 (ASV: 3.98; 51.70 g) exhibits the lowest TKW and high instability (
Figure 18).
4. Discussion
The multi-environment evaluation revealed that genotype performance in grain yield and quality traits was highly context-dependent, underscored by a significant genotype × environment (G × E) interaction for yield and hectoliter weight. In our trials, the environmental main effect on yield was relatively modest, whereas G × E effects were pronounced—indicating that yield rankings shifted substantially across the three Peruvian sites. This pattern aligns with the general experience in cereal METs, where environment often explains the majority of yield variation (e.g., ~80–90%) and G × E typically accounts for a sizeable remainder. Even so, the yield instability observed here emphasizes that selecting broadly adapted genotypes requires explicit G × E analysis. Our results mirror findings in tropical maize and other cereals where G × E can contribute over one-quarter of total variance [
41,
42] and confirm that yield performance in durum wheat is highly location-specific unless stable genotypes are identified. Consistent with this, we found that certain lines (e.g., TD-037 and TD-033) ranked top at one site but not others, whereas a few genotypes maintained near-average yields everywhere. This reinforces that “high-yielding” lines are only superior in a target region if their G × E responses align with local conditions. Genotypes showing extreme sensitivity to micro-environmental fluctuation—for instance, TD-062, which had both low mean yield and large yield swings—are typically undesirable for direct release. Breeding programs often discard such unstable lines or recycle them in crosses to break unfavorable linkages [
43]. In our study, TD-062’s poor and inconsistent yield across environments exemplifies this: it likely carries alleles conferring stress susceptibility, making it a better candidate for trait introgression (to pass on specific qualities in new combinations) than for cultivation as a standalone variety.
The environmental contrasts among the test locations—though all are characterized as semi-arid—still imposed different stresses that drove crossover performance. The AMMI analysis indicated that the first interaction principal component (IPCA1) captured a dominant gradient, likely related to temperature and moisture differences across sites, as has been observed in other wheat trials. Indeed, studies in bread wheat and oats show that IPCA1 often reflects a composite environmental index (e.g., cooler/moister vs. hotter/drier sites) [
44,
45]. The second interaction axis (IPCA2) in our data also explained a large portion of G × E variance, suggesting an additional underlying factor—possibly solar radiation during grain fill or soil fertility—affecting yield stability. Controlled-environment research supports this interpretation; for example, Li et al. [
46] showed that reduced radiation can depress wheat yields by ~15%. Such insights point to the value of measuring key environmental covariates in future trials (e.g., incident light, temperature extremes, and soil moisture) to better interpret G × E patterns. Furthermore, climate change is expected to accentuate these environmental effects; meta-analyses project that rising heat and drought will impact not only yields but also grain quality parameters concurrently [
47,
48]. This lends urgency to breeding strategies that prioritize stability and resilience in both productivity and end-use quality under erratic weather conditions.
Our findings highlight a couple of durum lines achieving the sought-after balance of high mean yield and stability. Notably, TD-014 and TD-001 (the local check ‘INIA 412 Atahualpa’) produced near-average yields with minimal G × E interaction, indicating broad adaptation. These genotypes were positioned near the origin in both AMMI and GGE biplots, a favorable property for general-purpose cultivars. From an agronomic standpoint, such broadly adapted lines are valuable for risk-prone semi-arid systems because they buffer yield across variable conditions—a trait increasingly prized under climate volatility [
49]. In contrast, several other lines showed specific adaptation to particular site conditions. For example, TD-033 and TD-037 excelled in the higher-yielding environment of Santa Rita, while TD-053 and TD-061 thrived best at Santa Elena and San Francisco de Paula, respectively, as evidenced by their vertex positions on the GGE “which-won-where” biplot. Each of the three test locations effectively constituted a distinct mega-environment, with a different winning genotype—a scenario commonly seen in multi-site trials where crossover interactions lead to multiple mega-environments [
50]. Our GGE biplot accounted for over 90% of the total G + G × E variation in just two principal components, exceeding the ~80% threshold suggested for reliable visualization [
51,
52]. This high explanatory power gave us confidence in the biplot’s identification of mega-environments and genotype niches. Similar GGE-based studies in wheat and maize have reported two to four mega-environments in a region, so our finding of three is in line with expectations for diverse semi-arid sites. It is worth noting that one genotype, TD-062, sat on an extremity of the biplot with no environment falling in its sector—meaning it “won” nowhere. This underscores that an extreme interaction score is not inherently beneficial; in TD-062’s case it reflected poor stability without any compensating yield advantage. Such cases have been documented in other GGE analyses of wheat, where outlier genotypes can be visually striking yet agronomically inferior [
53]. Thus, even as we exploit specific adaptation (e.g., deploying TD-053 in the environment where it excels), we must also recognize when a genotype’s instability outweighs its merits.
From a breeding perspective, these results have direct implications for selection and advancement of durum lines in Peru and similar environments. First, the convergence of evidence from multiple analyses (AMMI, GGE, and stability indices) increases confidence in our selection of broadly adapted vs. specifically adapted candidates. TD-033, for instance, was identified as a top performer in Santa Rita by both AMMI and GGE, and crucially, it did so without sacrificing stability. This genotype also ranked first by the WAASBY index—a composite metric that integrates yield level with stability [
38]—confirming that its high yield was accompanied by reliable performance across environments. Likewise, TD-014 and TD-026 achieved strong WAASBY scores in our study, reflecting a robust combination of productivity and consistency, which makes them strong candidates for wider release or further multi-location trials. In contrast, an environment-specific high-yielder like TD-053 (which topped yields at Santa Elena) was penalized in the WAASBY ranking, falling just below the overall mean. This occurred because WAASBY imposes a trade-off: genotypes with outstanding yield in one place but large fluctuations elsewhere receive a lower integrated score. Such outcomes echo recent reports in other crops—for example, in sugar beet and soybean METs—where tweaking the weighting of yield vs. stability in the WAASBY formula (e.g., 60:40 instead of the default 50:50) has been suggested to better suit cases of targeted adaptation. For our durum breeding program, this means that if a target environment is exceptionally stable or highly managed (such that specific adaptation is desirable), we might adjust the index weighting to favor raw performance more. However, under typical smallholder conditions in semi-arid regions, a balanced weighting (giving equal importance to yield and stability) seems prudent to avoid recommending “boom-and-bust” cultivars. The power of WAASBY lies in its ability to simplify decision-making by condensing two important criteria into one number—as evidenced by the close agreement we observed between the WAASBY-based ranking and the separate AMMI/GGE conclusions in this study.
We also evaluated the traditional AMMI Stability Value (ASV) to complement WAASBY [
54]. ASV is valuable for pinpointing the most stable genotypes regardless of yield level, which is useful for early-generation selection when breeders may want to cull highly erratic lines. In our results, TD-014 had the lowest ASV, indicating the smallest interaction effect, and indeed this line proved to be an exceptionally consistent performer across sites. This outcome is consistent with other reports on durum wheat and even other crops like chickpea, where low-ASV genotypes show superior predictability under erratic rainfall regimes [
55]. Importantly, the highest-yielding line, TD-033, exhibited only a moderate ASV (≈0.41 in our analysis). This suggests that TD-033’s yield advantage did not come at the cost of marked instability—a favorable scenario indicating a partial escape from the typical yield–stability trade-off. By contrast, genotypes such as TD-037 and TD-053, which were among the top yielders in specific environments, showed much higher ASV values (i.e., they contributed strongly to G × E). These two exemplify the classic trade-off: they can be “winners” under the right conditions but are less dependable across variable conditions. Similar trade-offs have been reported in other dryland cereals (e.g., pearl millet [
56]), reinforcing that such genotypes should be deployed only in well-characterized target environments or else used in breeding to transfer their desirable traits into a more stable genetic background. It is instructive to integrate the outcomes for secondary traits like plant height, thousand-kernel weight (TKW), and hectoliter weight (grain test weight) into this discussion, since an ideal cultivar must harmonize all these aspects.
Interestingly, our results revealed that the tallest genotypes, TD-062 (118.8 cm) and TD-043 (113.7 cm), exhibited vigorous vegetative growth but produced comparatively lower yields (<2.5 t ha
−1). In contrast, the highest-yielding genotypes (TD-020, TD-026, and TD-033) maintained intermediate stature close to 100 cm, suggesting that excessive plant height may divert assimilates toward structural biomass rather than grain filling. This inverse relationship between plant height and yield agrees with modern breeding outcomes reported by De Careddu et al. [
57], who documented that genetic reductions in plant stature—largely resulting from the incorporation of semi-dwarf
Rht alleles—enhanced the harvest index (HI) and yield stability in Italian durum wheat cultivars.
From an agronomic perspective, genotypes exceeding 100 cm in height, such as TD-062 and TD-043, displayed vigorous vegetative growth under favorable environments (Santa Elena and Santa Rita) but may be prone to reduced yield efficiency or lodging in more stressful sites. Conversely, intermediate genotypes (TD-033, TD-026, and TD-020), positioned near the origin of the GGE biplot, exhibited high stability and consistent performance across environments, indicating broad adaptability rather than site-specific preference. These findings highlight the importance of balancing plant stature and environmental adaptability in durum wheat breeding, supporting targeted selection for both stable and responsive genotypes in southern Peru.
Plant height in our study was influenced strongly by genotype (G) and environment (E) main effects, but notably showed no significant G × E interaction. In practical terms, this means each genotype maintained a relatively consistent height ranking across the three sites. This is a common observation in the absence of major stress differentials—for instance, other wheat and barley studies have found G and E effects on height (and heading date) to outweigh any interaction under moderate conditions [
24,
36]. Environmental factors did, however, shift the absolute plant heights: the first IPCA for height (71.4% variance) suggested an underlying gradient such as altitude or temperature. Indeed, our warmest, lowest-elevation site (Santa Elena, with sandy soils) tended to produce the tallest plants—e.g., TD-062 reached ~119 cm there—whereas the cooler site favored shorter stature. This reflects known physiological responses: higher temperatures and possibly lower soil fertility at Santa Elena likely extended internode elongation in susceptible genotypes, albeit at the cost of stability. Agronomically, extremely tall plants in wheat are undesirable in modern systems due to lodging risk and harvest difficulty, while very short plants can suffer from reduced biomass and light capture [
49]. Therefore, the breeding goal for plant height is an intermediate, stable stature—and encouragingly, we identified lines like TD-020, TD-026, and TD-033 that fit this profile (~95–97 cm with minimal height variation). These lines combine manageable plant architecture with the yield and stability advantages discussed earlier, making them promising candidates for climate-resilient, mechanization-friendly cropping systems. By contrast, TD-062 and TD-043, which were the tallest entries and also among the most interaction-prone, would likely incur higher lodging risk and are less suitable for broad cultivation. The GGE biplot for plant height provided a parallel insight: it showed TD-062 at a vertex capturing the hotter environments (Santa Elena and Santa Rita), confirming that this genotype’s height was notably plastic to those conditions. Meanwhile, genotypes like TD-014 and TD-037 formed vertices not associated with any specific environment, again indicating that extreme phenotypes (very short or very tall) did not translate to an adaptive advantage in our test locations. The stability indices reinforced these points: WAASBY and ASV for height identified TD-061 and TD-044 as the most stable short-statured genotypes (ideal for intensive management), whereas TD-037 and TD-014 had high ASV (unstable height) and thus would be risky choices for farmers without further improvement. This suggests that our breeding program should prioritize medium-height lines and perhaps explore indirect selection tools (such as canopy temperature or stay-green traits) to screen for genotypes that maintain yield stability without extreme stature. In fact, incorporating physiological covariates into stability analysis has been advocated in recent studies—for example, Al-Ashkar et al. [
58] successfully combined ASV with canopy temperature data to identify stress-resilient wheat genotypes. Similar multi-trait approaches in durum could help pinpoint genotypes with optimal architecture and water-use efficiency for semi-arid agriculture.
Grain quality traits displayed their own interaction patterns. Thousand-kernel weight (TKW), a yield component and proxy for grain size, was found to be predominantly controlled by genetic differences in our genotypes (genotype accounted for ~47% of TKW variance in the ANOVA, vs. < 10% by environment;
Table S3). However, TKW still showed a significant G × E, indicating that some genotypes filled grains better in certain environments. For instance, TD-053 achieved the highest TKW (>55 g) with low interaction, excelling especially at Santa Elena (the coastal valley site). TD-062 also produced large kernels (≈59 g) and was relatively stable for TKW, even though, paradoxically, this did not translate into high yield—possibly due to trade-offs like fewer grains per spike or poor tillering. On the other hand, TD-014 attained an outstanding TKW (~60 g) but only under specific conditions (showing a clear affinity for the San Francisco de Paula environment) at the cost of stability. This suggests that TD-014’s large grain size manifests when cooler or more favorable filling conditions occur, whereas in harsher sites its advantage diminishes. Such G × E for grain weight is reminiscent of farmers’ observations that certain landraces have plumper grains “only in good years.” The GGE biplot for TKW indicated that Santa Rita was dominated by one genotype (perhaps TD-044) while several high-TKW lines (TD-033, TD-061, TD-037, etc.) did not specifically “win” any environment, clustering as non-winning vertices. This again underscores that extreme performance in one trait (grain size) doesn’t guarantee broad adaptation—a genotype needs a whole-suite balance. The hectoliter weight (HLW)—a key grain quality metric reflecting bulk density—showed significant effects of G, E, and G × E, meaning that test weight is also subject to environmental modulation. We observed that some environments (likely the cooler, higher-altitude site) produced generally higher HLW than others, which aligns with known effects of heat and drought stress in reducing grain filling and density [
59]. Two lines, TD-044 and TD-014, stood out with superior HLW (>79 kg hL
−1) and near-origin positions in biplots, indicating they combined high grain density with stability. Interestingly, these were also among the top yield-stable genotypes, suggesting a positive correlation between yield stability and maintaining grain quality. This is encouraging for breeding, as it means we did not have to sacrifice quality for yield in those lines. Previous work has similarly found that certain durum genotypes can achieve both high test weight and adaptation to arid zones [
22]. In contrast, the check variety (TD-001, Atahualpa) had the lowest HLW (~77.8 kg hL
−1) and also high instability (ASV > 1.9). Its GGE placement indicated specific adaptation to one environment (San Francisco) but poor performance elsewhere, reflecting its older genetic background’s limitations in grain filling under stress. From a quality standpoint, low and erratic HLW is problematic for semolina processors, so this underscores why newer lines like TD-044 and TD-014 are preferable. The WAASBY index applied to HLW further confirmed the superiority of TD-044 and TD-014—they had the best combined performance–stability scores for grain density. On the other hand, genotypes such as TD-062, TD-043, and TD-001 fell well below the average in HLW and stability. For these, their inconsistent quality across environments would likely translate to unacceptable variation in end-use (for example, flour blend inconsistencies), so they are better suited as parents in breeding (to perhaps transfer specific disease resistances or physiological traits) rather than as cultivars. It is noteworthy that TD-062—while agronomically poor in yield—did have large, dense grains; this could make it useful in crosses aiming to improve grain size or quality in a more adapted genetic background.
Overall, our study demonstrates the value of leveraging integrated stability analyses to drive breeding decisions. By using complementary tools (AMMI, GGE, WAASBY, and ASV), we obtained a nuanced understanding of each genotype’s profile. In many cases these methods agreed, which strengthens our conclusions. For example, all approaches identified TD-014 and TD-033 as among the most desirable lines, whereas TD-062 and TD-043 consistently ranked at the bottom. Such concordance is reassuring and has been noted in other recent studies that combine multiple indices for selection. In practice, we propose a two-stage selection strategy: (1) use ASV (or similar strictly stability measures) early in the breeding cycle to eliminate highly unstable genotypes (as we would now do with TD-062, TD-061, etc.), and (2) in later stages, apply WAASBY or multi-trait indices to choose those entries that best balance yield with stability. This approach ensures that promising high-yield lines are not discarded simply due to one-time poor performance, as WAASBY will appropriately reward their mean yield if they are not too erratic. Conversely, it prevents the advancement of high-yield “flakes” that might look good in one trial but fail elsewhere. Our recommendations echo the findings of Kyratzis et al. [
60], who emphasized assessing several stability parameters (AMMI- and regression-based, among others) for both agronomic and quality traits to inform durum wheat breeding under Mediterranean conditions. They found that different stability metrics can correlate poorly with one another and with trait means, implying that a multi-criteria approach is needed for reliable selection—a point substantiated by our experience here. Furthermore, recent advances suggest that incorporating physiological and molecular indicators alongside statistical indices can enhance selection precision. For instance, Rutkoski et al. [
61] demonstrated improved G × E predictability when stability indices were coupled with traits like canopy temperature depression or NDVI, while genomic tools can also assist in predicting stability. Such integrative methods could be trialed in our program to accelerate gains in drought tolerance and stability of yield. Notably, our top-yielding stable lines (e.g., TD-033 and TD-014) may serve as excellent parental stock for crossing, as they likely carry alleles for both performance and resilience. In parallel, the identification of specifically adapted lines (like TD-053 for one sub-region) allows the breeding program to consider releasing them directly in their niche environments, where they can outperform broadly adapted ones.