Do We Need to Breed for Regional Adaptation in Soybean?—Evaluation of Genotype-by-Location Interaction and Trait Stability of Soybean in Germany

Soybean is a crop in high demand, in particular as a crucial source of plant protein. As a short-day plant, soybean is sensitive to the latitude of the growing site. Consequently, varieties that are well adapted to higher latitudes are required to expand the cultivation. In this study, we employed 50 soybean genotypes to perform a multi-location trial at seven locations across Germany in 2021. Two environmental target regions were determined following the latitude of the locations. Adaptation and trait stability of seed yield and protein content across all locations were evaluated using Genotype plus Genotype-by-Environment (GGE) biplots and Shukla’s stability variance. We found a moderate level of crossing-over type genotype-by-location interaction across all locations. Within the environmental target regions, the genotype-by-location interaction could be minimised. Despite the positive correlation (R = 0.59) of seed yield between the environmental target regions and the same best-performing genotype, the genotype rankings differed in part substantially. In conclusion, we found that soybean can be grown at a wide range of latitudes across Germany. However, the performance of genotypes differed between the northern and southern locations, with an 18.8% higher mean yield in the south. This in combination with the observed rank changes of high-performing genotypes between both environmental target regions suggests that selection targeted towards environments in northern Germany could improve soybean breeding for those higher latitude regions.


Introduction
Soybean (Glycine max (L.) Merr.) is a leguminous crop of globally high economic importance. Due to its high protein content, it is mainly used as animal feed. However, soybean products, such as tofu and dairy alternatives, are also very popular for human consumption with a steadily increasing demand in Germany [1,2] and worldwide [3]. In addition, soybean is also grown for its high oil content. As a crop that was domesticated in temperate China [4], soybean production requires warmer temperatures [5]. The largest producers are the USA, Brazil, and Argentina [6]. As a result of selection and breeding efforts, cultivars that mature earlier and are more tolerant to cooler temperatures are available nowadays [7,8]. This enables the successful cultivation of soybean at higher latitudes, like in Germany, despite the shorter growth periods and the generally lower temperatures, especially in spring. In Germany, the majority of the soybean acreage lies in the southern half and to date, there is only very little soybean cultivation in the more northern regions [9]. The southern regions are generally better suited for soybean cultivation due to temperature, rainfall patterns, solar radiation, and shorter photoperiod [5]. Photoperiod is a decisive factor in soybean cultivation since soybean is originally a short-day plant and as such requires

Results
This study analyses single-year data from seven locations with a focus on soybean performance in different regions in Germany. The locations in our study ( Figure 1, Table 1, Figure S1) were separated into two environmental target regions based on their latitude, as this represents an important factor of soybean adaptation. Furthermore, this approach separates locations in the south, where soybean is already established, from the locations at higher latitudes, where there is barely any soybean cultivation. Therefore, the environmental target region in the southern half of Germany (STH) comprised the locations Eckartsweier (EWE), Hohenheim (HOH), Moosburg an der Isar (MOS), Niedertraubling (NIT), and Landshut (LDH). The environmental target region of the northern half of Germany (NTH) was assigned to the locations Biendorf (BID) and Gülzow-Prüzen (GÜL). traits and within NTH for seed yield. Heritability across all locations was high to very high for all traits, ranging from 0.79 for plant height to 0.90 for seed yield and 0.94 for protein content (Table 2). In addition, heritability was very high for seed yield in STH (0.91), but in NTH was substantially lower at 0.55. For seed yield, repeatability in single locations ranged from moderate in GÜL and BID (0.53 and 0.54) to very high in NIT and MOS (0.88 and 0.92). For protein content, repeatability within single locations ranged from moderate in BID (0.55) to very high in EWE and NIT (0.94).   Table 1. Environmental characterization of test locations, including latitude (LAT), altitude above sea level (ALT), mean temperature (TEMP), and precipitation sum (PCPN) across the growth period and for each month within the growth period, soil group (SG), soil type (ST), and sowing date (SD). Across all seven locations, the mean seed yield was 30.37 dt ha −1 ( Table 2) and across locations within the northern and southern half of Germany, the mean seed yield was 26.83 dt ha −1 and 31.82 dt ha −1 , respectively. Within single locations, mean seed yield ranged from 24.26 dt ha −1 for BID to 36.57 dt ha −1 for EWE. Protein content was high with an average of 45.09% across all locations. In single locations mean protein content ranged from 42.87% in MOS to 47.29% in EWE. Note that data on protein content was not available for GÜL. Seed yield was significantly (p < 0.05) positively correlated between all locations except for GÜL and MOS ( Figure S2). For protein content, all locations were highly significantly (p < 0.001) positively correlated ( Figure S3). Most trait correlations were not significant ( Figure S4). Exceptions were protein and oil content that were significantly (p < 0.001) negatively correlated (R = −0.91). Oil content was significantly (p < 0.05) positively correlated with seed yield at a moderate level (R = 0.31) and consequently protein content and seed yield were moderately negatively correlated (R = −0.20), though this was not significant. In addition, protein content was significantly (p < 0.05) positively correlated with days to maturity (R = 0.43), indicating that later maturing genotypes tended to have a higher protein content. Table 2. Means (X), minimum (X min ), and maximum (X max ) of BLUPs, genotypic variance (σ 2 G ), location variance (σ 2 L ), genotype-by-location interaction variance (σ 2 GL ), error variance (σ 2 ε ), heritability (h 2 ) and repeatability (w 2 ), and σ 2 GL to σ 2 G ratio (

EWE
) across all environments, across environments within environmental target regions, and within single environments for seed yield (SY) in dt ha −1 , protein content (PC) in %, oil content (OC) in %, plant height (PH) in cm, kernel dry matter (KDM) in %, and time between sowing and maturity (DTM) in days. The genotypic variance was significant for all traits across locations, within environmental target regions STH and NTH, as well as within single locations ( Table 2). In all cases was the genetic variance larger than the genotype-by-location interaction variance. The location variance was larger than the genotypic variance across all locations for all traits and within NTH for seed yield. Heritability across all locations was high to very high for all traits, ranging from 0.79 for plant height to 0.90 for seed yield and 0.94 for protein content ( Table 2). In addition, heritability was very high for seed yield in STH (0.91), but in NTH was substantially lower at 0.55. For seed yield, repeatability in single locations In Figure 2a,c, the scaled seed yield and protein content for each genotype within each location are shown. For example, genotype G35 showed a high seed yield relative to the other genotypes within all locations and G38 showed very high protein content relative to the other genotypes in all locations. Figure 2b,d show boxplots of the best linear unbiased predictors (BLUP) within each location. The red lines indicate that for both seed yield and protein content, the genotypes performing best in at least one location had higher values than average in most locations. However, clear crossing-over interaction for seed yield, even for the highest-performing genotypes, could be observed (Figure 2b). The angles between vectors in the Genotype plus Genotype-by-Environment (GGE) biplot for seed yield indicate a positive correlation between HOH, NIT, and LDH and between BID, GÜL, and EWE ( Figure 3a). The performance vs. stability biplot (Figure 3b) indicates that G35 had the highest seed yield across locations while being mostly stable (relatively short red dotted line), whereas G39 was the lowest yielding genotype but had similar stability. Shukla's stability variance ranged from 2.27 for G24 to 42.48 for G44 for seed yield (Table S1). As seed yield and stability variance for seed yield were significantly Figure 2. Heatmaps of (a) mean seed yield and (c) mean protein content for each genotype scaled within each environment to visualise the relative performance of the genotypes between environments. Boxplots of (b) seed yield and (d) protein content at each location. Red dots and lines show genotypes with the highest performance in at least one location. Green dots and lines show the performance of the remaining genotypes and the differences between locations. The angles between vectors in the Genotype plus Genotype-by-Environment (GGE) biplot for seed yield indicate a positive correlation between HOH, NIT, and LDH and between BID, GÜL, and EWE ( Figure 3a). The performance vs. stability biplot (Figure 3b) indicates that G35 had the highest seed yield across locations while being mostly stable (relatively short red dotted line), whereas G39 was the lowest yielding genotype but had similar stability. Shukla's stability variance ranged from 2.27 for G24 to 42.48 for G44 for seed yield (Table S1). As seed yield and stability variance for seed yield were significantly (p < 0.05) negatively correlated (R = −0.37) (Figure 4a), rankings between both measures differed.    For protein content, the vector angles in the GGE biplot indicate positive correlations between all locations (Figure 3c). The respective performance vs. stability biplot ( Figure  3d) shows G38 as the highest-performing but unstable genotype, and G47 as lowest performing while being relatively stable for protein content. The stability variance for protein content ranged from 0.03 for G43 to 6.01 for G50 (Table S2). Protein content and stability variance for protein content were significantly (p < 0.05) negatively correlated (R = −0.32) ( Figure 4b). As observed for the means for seed yield and protein content ( Figure S4), also their stability variances were not significantly correlated (Figure 4c). In NTH, seed yield ranged from 21.6 dt ha −1 to 30.47 dt ha −1 , whereas in STH seed yield was generally higher, ranging from 21.54 dt ha −1 to 38.77 dt ha −1 . Seed yield between NTH and STH was significantly (p < 0.05) positively correlated at R = 0.59 ( Figure 5c). Further, the top performing genotype (G35) was the same in both NTH and STH (Figure 5a,b, Table 3), but for the other genotypes, some moderate crossing-over can be observed in the respective boxplot ( Figure 5b). In addition, stability rankings according to Shukla's stability variance were different between both environmental target regions. Stability variance for NTH ranged from 0 for G3, G8, G15, G24, G25, and G46 as the most stable genotypes to 83.21 for G44 as the least stable genotype. For STH, stability variance ranged from 0.40 for G40 as the most stable and 38.57 for G44 as the least stable genotype (Table 3). However, stability analysis for five genotypes (G5, G7, G16, G21, G42, and G50) was not possible due to missing observations in GÜL. GGE biplots, ranking genotypes based on both their performance and stability within both environmental target regions compared to a computed fictional ideal genotype, show differences in the most suitable genotypes for NTH and STH, except for G35, which emerged as most suitable for both (Figure 5d,e). shows G38 as the highest-performing but unstable genotype, and G47 as lowest performing while being relatively stable for protein content. The stability variance for protein content ranged from 0.03 for G43 to 6.01 for G50 (Table S2). Protein content and stability variance for protein content were significantly (p < 0.05) negatively correlated (R = −0.32) (Figure 4b).
As observed for the means for seed yield and protein content ( Figure S4), also their stability variances were not significantly correlated (Figure 4c). In NTH, seed yield ranged from 21.6 dt ha −1 to 30.47 dt ha −1 , whereas in STH seed yield was generally higher, ranging from 21.54 dt ha −1 to 38.77 dt ha −1 . Seed yield between NTH and STH was significantly (p < 0.05) positively correlated at R = 0.59 (Figure 5c). Further, the top performing genotype (G35) was the same in both NTH and STH (Figure 5a,b, Table 3), but for the other genotypes, some moderate crossing-over can be observed in the respective boxplot (Figure 5b). In addition, stability rankings according to Shukla's stability variance were different between both environmental target regions. Stability variance for NTH ranged from 0 for G3, G8, G15, G24, G25, and G46 as the most stable genotypes to 83.21 for G44 as the least stable genotype. For STH, stability variance ranged from 0.40 for G40 as the most stable and 38.57 for G44 as the least stable genotype (Table 3). However, stability analysis for five genotypes (G5, G7, G16, G21, G42, and G50) was not possible due to missing observations in GÜL. GGE biplots, ranking genotypes based on both their performance and stability within both environmental target regions compared to a computed fictional ideal genotype, show differences in the most suitable genotypes for NTH and STH, except for G35, which emerged as most suitable for both (Figure 5d,e).

Phenotypic Performance and Variance Components
Very high heritabilities across locations for both seed yield (0.90) and protein content (0.94) in conjunction with a lower, however highly significant (p < 0.001), genotype-bylocation interaction variance component (σ 2 GL ), illustrate a moderate level of genotypeby-location interaction ( Table 2). For both traits, genotype-by-location interaction of the crossing-over type was observed, indicating the benefit of forming environmental target regions as subgroups of locations (Figure 2b,d). Importantly, to evaluate the repeatability of the observed interaction patterns, further evaluation across several years is required. The heritability for seed yield in STH was very high, but for NTH was substantially lower at 0.55. Accordingly, the two northern locations GÜL and BID forming the environmental target region NTH both showed lower repeatabilities (0.54 and 0.53), indicating a lower data quality in comparison to the other locations. This could be due to more inhomogeneous field conditions that could not sufficiently be accounted for by the field design or adaptation issues of at least some of the genotypes.

Performance and Trait Stability of Genotypes
In practical breeding, combining high performance and trait stability is a key objective to ensure reliable yields across different locations and years. In the case of this study, evaluating single-year data, yield stability via Shukla's stability variance is analysed by estimating a genotype-specific σ 2 GL . Genotypes with stability variance σ 2 i = 0 do not show any genotype-by-location interaction and are therefore considered perfectly stable across locations.
The top-performing genotypes for seed yield differed between locations showing a crossing-over between most locations (Figure 2b). In accordance with this, stability rankings were quite different from the performance rankings of the genotypes for seed yield across all locations (Table S1). However, the correlation between stability variance and seed yield was significant (p < 0.05) and negative at a moderate level (R = −0.34) (Figure 4a). This indicates a trend of higher-yielding genotypes tending to show higher yield stability across locations. However, the highest-yielding genotype G35 only reached the rank of 28 out of 50 in stability. Genotype G37, by contrast, ranked third in mean seed yield and second in stability, resulting in G37 being the closest to an ideal genotype for seed yield in our study. It should be noted that the rankings by performance were not identical to the respective GGE biplot (Figure 3b) but showed minor rank changes. This is likely due to the different methods of the metan package [22] to estimate the mean performance of each genotype.
Similar to the results for seed yield, protein content and its stability variance were moderately negatively correlated at R = −0.31 (Figure 4b). Stability variances for both traits were not significantly correlated with each other (Figure 4c). For protein content, only two top-performing genotypes were found, G38 in all locations but MOS, where G42 had the highest protein content (Figure 2c,d). Stability analysis according to Shukla's stability variance, however, classified G38 as highly unstable with the rank 49 out of 50. Following the concept of dynamic stability, stability variance measures how the genotype performance changes relative to the environmental mean (Equation (1)). Genotype G38 did not follow this pattern. In consequence, despite being the superior genotype in almost all locations and in addition to showing high performance in MOS, G38 is not classified as stable. Notably, genotypes that vary around the location means will be classified as unstable, but also a genotype that performs better than the mean at every location. So from a practical point of view, classifying G38 as unstable is misleading since the main interest of a breeder lies in the consistently high performance of a genotype. Accordingly, it should be noted that trait performance (especially yield) is the most important breeding objective while stability measures can only support selection decisions. Notably, if the number of environments in a multi-environment trial is large enough, the estimation of mean performance across environments will automatically penalise unstable genotypes that vary around the environment means [23]. It should be noted here, that this study focuses on the genotypic stability across locations in one test year and therefore cannot provide conclusions about the trait stability across years.

Expansion of Soybean Cultivation to Higher Latitudes Is Possible
The mean seed yield across locations within STH was 18.6% higher than across locations within NTH (Table 2). However, the significant (p < 0.05) positive correlation (R = 0.59) between seed yield in NTH and STH (Figure 5c) suggests that genotypes with high performance in STH tended to also show high seed yield relative to the other genotypes in NTH. This is further highlighted and supported by the genotype mean comparison shown in Figure 5b. Despite the rank changes due to crossing-over genotype-by-location interaction between NTH and STH, high-ranking genotypes in STH tended to also rank higher in NTH in many cases. Generally, crossing-over type genotype-by-environment interaction can be met by different selection decisions between target regions or mega environments [17,23,24]. However, within mega environments or target regions, genotype-by-environment interaction should be minimised. In our case, this goal was reached with the environmental target regions, as the ratio of genotype-by-location interaction to genotypic variance (σ 2 GL : σ 2 G ) decreased from 0.40 across all locations to 0.28 within NTH and 0.22 within STH.
When evaluating genotypes in both environmental target regions for high seed yield and yield stability in the year 2021, it appears that different selection decisions should be taken for NTH and STH (Figure 5d,e). The highest-performing genotype (G35) was the same in both target regions. However, for example, G8 was just as suitable for STH as G35, while it merely showed seed yield close to the mean in NTH. Consequently, despite the single-year analysis, the results indicate that selection decisions made for one environmental target region cannot confidently be transferred directly to another. To further explore this, we compared the mean seed yield of the five highest-ranking genotypes across all locations to the mean of the five highest-ranking genotypes within each of the two regions. Selecting for specific adaptation led to a 0.40% yield increase in STH compared to selection across all locations. For NTH, by contrast, the yield increase was higher at 2.32%. This analysis must, however, be treated with caution, due to the higher number of locations in STH. This is also an issue when analysing mean seed yield and stability across all seven locations (Figure 3b). Rankings and stability depictions are similar to those observed for STH. As an environmental target region consisting of five locations compared to NTH with only two locations, STH is overrepresented in the analysis across all locations. Keeping this in mind, our results indicate that separate analysis of environmental target regions enables more accurate selection against genotype-by-location interaction. However, short environment vectors for GÜL and BID (Figure 3a), proportional to the smaller variation within those locations (Figure 2b), show that both locations were not discriminating. This means that GÜL and BID were less suitable as test locations for the specific set of genotypes in this study compared to the other five locations [25,26]. One possible explanation could be less favourable environmental conditions during the growth period (Table 1), as especially in GÜL many genotypes did not fully mature before the field had to be harvested. However, we could not observe any single environmental factor explaining the substantially lower yields in NTH (Table 1). In consequence, the lower yields likely resulted from a combination of environmental conditions, including latitude.
The cause of genotype-by-environment interaction and trait instability are different sensitivities to environmental factors. In soybean, yield changes are often due to differences in tolerances to cooler temperatures [27][28][29] and photoperiod [11,30]. The aim of the differentiation between NTH and STH as environmental target regions was to mainly capture the effect of different latitudes on seed yield. In the north of Germany, due to typically lower temperatures and longer day length, earlier maturing genotypes are required compared to southern Germany in order to maximise the yield potential [29][30][31]. This generally indicates the need for selecting different genotypes for different latitudinal regions, which warrants further research. The majority of the genotypes evaluated in this study stem from a breeding program developed in the south of Germany at the location of EWE. Consequently, they were previously selected for high performance in that region. A question of practical relevance is therefore, whether we can successfully select soybeans in the south for cultivation in the north of Germany. Despite the correlation between the two regions, we found that evaluations made in the south did not sufficiently reflect the performance at higher latitudes. Consequently, specific selection decisions made separately in each of those regions appear necessary in soybean breeding. Nevertheless, the trend highlighted in this study needs to be validated by further research, also taking the effect of different growing seasons into account. Further, in practice, selection within mega environments or environmental target regions is only appropriate if the response to selection can be substantially increased [24]. Alternatively, an in part shared selection, for example, for simple traits in early generations, may be recommended.

Conclusions for Soybean Breeding at Higher Latitudes
This study illustrates that soybean can be cultivated in different regions in Germany, covering a wide range of latitudes from south to north. Notably, however, the suitability of genotypes for cultivation in southern Germany and more northern regions differed in part substantially. Consequently, the trend observed for the year 2021 indicates that different selection in breeding programs specifically targeting different regions appears beneficial. Key environmental influences affecting performance and therefore also genotype-by-environment interaction and stability in soybean are latitude, temperature, and precipitation [27][28][29][30][31][32]. Notably, to reliably classify mega environments, knowledge about the stability of the genotype-by-location interaction patterns over several years is necessary [17]. Patterns that are not repeatable across years cannot be utilised [23] and are therefore of no use for specific cultivar selection. Therefore, further research evaluating multi-environment trials comprising several years is crucial for a deeper understanding of these trends. Consequently, in addition to further exploring the environmental target regions defined in this study, it would be interesting to investigate the patterns of genotypeby-location interaction more thoroughly by evaluating a larger number of locations in different regions covering a wide range of latitudes across several years. An unconventional but promising approach to do so is citizen science which allows for a dense coverage of locations across an entire country or even across countries [33]. This would also offer further insights into the significance of latitude and specific weather conditions in the formation of environmental target regions and mega environments relevant to soybean breeding at higher latitudes.

Plant Material and Field Trials
We evaluated a total of 50 genotypes with favourable tofu quality, including eight registered cultivars (Table S3). The field trials were conducted in 2021 at seven locations in Germany: EWE, HOH, MOS, NIT, LDH, BID, and GÜL ( Figure 1, Table 1). Further two trial locations in Beckum (BEC) and Kiel (KIL) could not provide data, as at these locations the soybeans did not reach maturity due to bird pests and unfavourable weather conditions throughout the year. Weather data for the locations MOS and NIT were obtained from Agrarmeteorologie Bayern [34]. Weather data for the locations in Baden-Württemberg (EWE and HOH) were obtained from Agrarmeteorologie Baden-Württemberg [35]. For LDH, BID, and GÜL weather data were obtained from the Climate Data Center [36].
The trials were designed in an alpha lattice design with two replications, ten blocks per replication, and five genotypes per block. The soybeans were sown in four rows in plots of 6 m length and 1.5 m width. Standard agricultural practices for soybean production were followed. Seed yield in dt ha −1 was measured on the combine harvester and corrected for seed moisture content. Protein and oil content in percent were measured on dried seed (9-10% moisture content) in the laboratory with near-infrared spectroscopy (NIRS) using a Polytec PSS 2120 diode spectrometer and the corresponding software PSS-HOP (Polytec GmbH, Waldenbronn, Germany). Days to maturity were assessed as days between sowing and maturity determined by dry pods rustling when touched (growth stage R8). Further, plant height in cm and kernel dry matter in percent were measured.

Phenotypic Analysis
BLUPs and variance components were calculated across all locations and across locations within environmental target regions with the model y ijkm = µ + g i + l j + gl ij + r jk + b jkm + ε ijkm where y ijkm describes the phenotypic observation, µ is the intercept, g i is the effect of genotype i, l j is the effect of location j, gl ij is the interaction between genotype i and location j, r jk is the effect of replication k within location j, b jkm is the effect of block m within replication k in location j, and ε ijkm is the corresponding residual error term. BLUPs and variance components within locations were calculated with the model y ijk = µ + g i + r j + b jk + ε ijk where y ijk describes the phenotypic observation, µ is the intercept, g i is the effect of genotype i, r j is the effect of replication j, b jk is the effect of block k within replication j, and ε ijk is the corresponding residual error term. For both models, the variance components were tested for significance using likelihood ratio tests. Heritability (h 2 ) and repeatability (w 2 ) were calculated as proposed by Cullis et al. [37] and described by Piepho and Möhring [38]: where ϑ BLUP describes the mean variance of the difference between two BLUPs and σ 2 g is the genotypic variance. The sole difference between these values is the calculation across environments (h 2 ) and the calculation within environments (w 2 ). Standardised values were calculated following the min-max normalisation. Correlations were calculated as Pearson's product-moment correlation between pairwise complete observations.

Genotype-by-Location Interaction and Stability Analysis
Environmental target regions were defined by their latitude as an important component of soybean adaptation. For this, locations were separated into two groups by their geographic location: GÜL and BID in the northern half of Germany (NTH) and EWE, HOH, MOS, NIT, and LDH with rather similar latitudes in the south of Germany (STH).
Genotype plus Genotype-by-Environment (GGE) biplot is an approach proposed by Yan et al. [39]. It aids the interpretation of genotype-by-environment interaction by visualising genotype-by-environment interaction patterns in the form of a principal component analysis (PCA) biplot [26,39]. GGE biplot can be used to investigate the relationships between environments and the differences between genotypes across environments [25,26]. Further, different GGE biplots can be used to visualise the stability of genotypes and aid in selection decisions by combining performance and stability parameters within a single plot.
Genotype-by-location interaction was analysed using GGE biplot analysis [25,26] based on the equation where y ij is the mean performance of genotype i in location j, λ 1 , and λ 2 are the singular values, ξ i1 , and ξ i2 are the eigenvectors of genotype i, and η 1j and η 2j are the eigenvectors of location j for the two largest principal components (PC1 and PC2), respectively, and ε ij is the corresponding error term. All plots were made without scaling and using environmentcentering. To evaluate the relationships between locations, singular values were partitioned into the locations (SVP = 2). In order to compare genotypes, singular values were partitioned completely into the genotypes (SVP = 1).